Counter of Festivals

Ashok Blog for SQL Lovers

Monday, 25 August 2014

To find Second largest from Sub Query without using Row_number

To find Second largest from Sub Query without using Row_number :

select * from payment
where payment_amt=(select min(payment_amt) from (select distinct top 2 payment_amt from payment order by payment_amt desc)temp)

select top 5 *  from payment
order by payment_amt desc


select COUNT(*) as count from payment

Screen shot:


Rebuilding INDEX on Table to increase Performance on SQL Table.

If you have experienced low performance or long running Queries or SP then you have to check Three thinks
  1. Statistics of Table
  2. Index of Table
  3. Operators/Technique used in Table (IN/Exists Operator)

1) Statistics of Table

DBCC SHOW_STATISTICS('Tablename','Primarykeyconstriantname')

Update Statistics Tablename

2)Index of Table

ALTER INDEX ALL ON Tablename
REORGANIZE

ALTER INDEX ALL ON Tablename
REBUILD

3)Technique used in Statements/Code in SP

Exists Operator Much Better than IN operator (some cases)

Performance considerations for common SQL queries

Performance considerations for common SQL queries

Table variable is a viable alternative to Sub-queries.

SQL offers many different methods to produce the same results.  There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set.  Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient. 
For the queries below, I downloaded the test database from SQLSkills:  http://www.sqlskills.com/sql-server-resources/sql-server-demos/.  There isn’t a lot of data, but enough to prove my point: dbo.member has 10,000 records, and dbo.payment has 15,554.  Our result set contains 6,706 records.
The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.

/*************/ 
/* Sub Query  */ 
/*************/ 
SELECT  a.[Member Number] , 
        m.lastname , 
        m.firstname , 
        a.[Number Of Payments] , 
        a.[Average Payment] , 
        a.[Total Paid] 
FROM    ( SELECT    member_no 'Member Number' , 
                    AVG(payment_amt) 'Average Payment' , 
                    SUM(payment_amt) 'Total Paid' , 
                    COUNT(Payment_No) 'Number Of Payments' 
          FROM      dbo.payment 
          GROUP BY  member_no 
          HAVING    COUNT(Payment_No) > 1 
        ) a 
        JOIN dbo.member m ON a.[Member Number] = m.member_no 
       
/***************/ 
/* Cross Apply  */ 
/***************/ 
SELECT  ca.[Member Number] , 
        m.lastname , 
        m.firstname , 
        ca.[Number Of Payments] , 
        ca.[Average Payment] , 
        ca.[Total Paid] 
FROM    dbo.member m 
        CROSS APPLY ( SELECT    member_no 'Member Number' , 
                                AVG(payment_amt) 'Average Payment' , 
                                SUM(payment_amt) 'Total Paid' , 
                                COUNT(Payment_No) 'Number Of Payments' 
                      FROM      dbo.payment 
                      WHERE     member_no = m.member_no 
                      GROUP BY  member_no 
                      HAVING    COUNT(Payment_No) > 1 
                    ) ca

/********/                    
/* CTEs  */ 
/********/ 

WITH    Payments 
          AS ( SELECT   member_no 'Member Number' , 
                        AVG(payment_amt) 'Average Payment' , 
                        SUM(payment_amt) 'Total Paid' , 
                        COUNT(Payment_No) 'Number Of Payments' 
               FROM     dbo.payment 
               GROUP BY member_no 
               HAVING   COUNT(Payment_No) > 1 
             ), 
        MemberInfo 
          AS ( SELECT   p.[Member Number] , 
                        m.lastname , 
                        m.firstname , 
                        p.[Number Of Payments] , 
                        p.[Average Payment] , 
                        p.[Total Paid] 
               FROM     dbo.member m 
                        JOIN Payments p ON m.member_no = p.[Member Number] 
             ) 
    SELECT  * 
    FROM    MemberInfo

/************************/ 
/* SELECT with Grouping   */ 
/************************/
SELECT  p.member_no 'Member Number' , 
        m.lastname , 
        m.firstname , 
        COUNT(Payment_No) 'Number Of Payments' , 
        AVG(payment_amt) 'Average Payment' , 
        SUM(payment_amt) 'Total Paid' 
FROM    dbo.payment p 
        JOIN dbo.member m ON m.member_no = p.member_no 
GROUP BY p.member_no , 
        m.lastname , 
        m.firstname 
HAVING  COUNT(Payment_No) > 1

We can see what is going on in SQL’s brain by looking at the execution plan.  The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes.  SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan.  We can settle this once and for all.  Here is what SQL did with these queries:

image
Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them.  Everybody is right…..I guess we can all finally go to lunch together!  But wait a second, I may not be a fighter, but I AM an instigator.  Smile   Let’s see how a table variable stacks up.  Here is the code I executed:
/********************/
/*  Table Variable  */ 
/********************/ 
DECLARE @AggregateTable TABLE 
    ( 
      member_no INT , 
      AveragePayment MONEY , 
      TotalPaid MONEY , 
      NumberOfPayments MONEY 
    )
INSERT  @AggregateTable 
        SELECT  member_no 'Member Number' , 
                AVG(payment_amt) 'Average Payment' , 
                SUM(payment_amt) 'Total Paid' , 
                COUNT(Payment_No) 'Number Of Payments' 
        FROM    dbo.payment 
        GROUP BY member_no 
        HAVING  COUNT(Payment_No) > 1 
 
SELECT  at.member_no 'Member Number' , 
        m.lastname , 
        m.firstname , 
        at.NumberOfPayments 'Number Of Payments' , 
        at.AveragePayment 'Average Payment' , 
        at.TotalPaid 'Total Paid' 
FROM    @AggregateTable at 
        JOIN dbo.member m ON m.member_no = at.member_no
In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query.  Here’s what I got:

image

Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps.  BUT, the combination of the 2 steps is only 22% of the batch.  It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan.  The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan.  While this is true, the estimate does not come in to play until you read from the table variable.  In this case, the table variable had 6,706 rows, but it still outperformed the other queries.  People argue that table variables should only be used for hash or lookup tables.  The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries.
If anyone does volume testing on this theory, I would be interested in the results.  My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is.

Select a random row from a table in SQL

Select a random row from a database table in SQL

databases


There are lots of ways to select a random record or row from a database table.

Select a random row with MySQL:

SELECT column FROM table
ORDER BY RAND()
LIMIT 1

Select a random row with PostgreSQL:

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

Select a random row with Microsoft SQL Server:

SELECT TOP 1 column FROM table
ORDER BY NEWID()

Select a random row with IBM DB2

SELECT column, RAND() as IDX 
FROM table 
ORDER BY IDX FETCH FIRST 1 ROWS ONLY

Select a random record with Oracle:

SELECT column FROM
( SELECT column FROM table
ORDER BY dbms_random.value )
WHERE rownum = 1

Tuesday, 12 August 2014

Blocking and DeadLocks and Types of Locks in SQL Server

What is Blocking and Deadlock in SQL Server? Different Types of Lock in SQL Server

Sometimes it helps to use analogies away from computers.
Let's say you have a ball and two children. Only one child can have the ball at any one time. However, if one of the children gets the ball and doesn't let go of it because he's distracted (watching TV, for example), then the other child will not get to play with the ball.
The other child is blocked from that resource.
If we compare this to the TV, for example, several children can watch TV at any one point.
Locks
If we move over to the database world, we see that there are different ways to use resources (just like our two examples above). We can perform "reads" or we can perform "writes".
When we want to read the data, there's no reason that other's can't read the data as well--just like two people watching TV. However, if we want to write the data, then we need to make sure that no one else is looking at it. If they are reading it while we're writing it, they will get "dirty" reads. (Meaning, they'll see the data partially written out, which will be invalid.)
In order to insure that these dirty reads never occur, we have two primary types of locks, Read Locks and Exclusive Locks.
Read Lock
You can have several different connections reading from the same datasource at any given time. But to insure that no one changes that data while they're reading it, they take out a Read Lock.
Once a connection has a read lock on a piece of data, all other connections must wait until the Read Lock is released before they can write the data. Others can, however, take out Read Locks of their own on that same piece of data.
Exclusive Lock
If a connection wants to update/insert/delete a piece of data, they have to take out an exclusive lock. This prevents any other connection from also taking out a lock on the data (making the lock exclusive to that connection).
When a connection has an exclusive lock on the data, no other connections may read from the data. This helps prevent dirty reads by insuring that no one can read the data while its being written.
Blocking
"Blocking" is simply a term that means that one connection is holding a lock on a resource when another connection wants to read or write to it. It doesn't necessarily mean that the owner connection won't release it, just that it's currently holding it.
Compare this to the case with a child holding the ball. The child holding the ball is blocking all other children from holding the ball.
Deadlock
I know you didn't ask this, but it's only one more step to get to deadlocks (and it's related very directly to blocking).
Deadlocks can happen when you have two connections that each have a lock, but they want each others resource. In this scenario, it's like two children that each has a ball, but wants the other's ball.
Like children, these connections are not willing to share at all. Each connection needs access to both of the resources in order to continue. However, they are in a state of permanent blocking. In this state, the parent (DBMS) has to come in and choose a loser so that one of the children (connections) can have access to both of the resources.
Once that "winning" connection is done, it releases the resources and then the other ("losing") connection can try again to get to both resources.
So, the concept of a deadlock is where you have two resources that are blocking each other.

Understanding Locking in SQL Server:

Ref: 
http://msdn.microsoft.com/en-us/library/aa213039%28v=sql.80%29.aspx

Microsoft® SQL Server™ 2000 has multigranular locking that allows different types of resources to be locked by a transaction. To minimize the cost of locking, SQL Server locks resources automatically at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency, but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, are expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions, but has a lower overhead because fewer locks are being maintained.
SQL Server can lock these resources (listed in order of increasing granularity).
ResourceDescription
RIDRow identifier. Used to lock a single row within a table.
KeyRow lock within an index. Used to protect key ranges in serializable transactions.
Page8 kilobyte –(KB) data page or index page.
ExtentContiguous group of eight data pages or index pages.
TableEntire table, including all data and indexes.
DBDatabase.

SQL Server locks resources using different lock modes that determine how the resources can be accessed by concurrent transactions.
SQL Server uses these resource lock modes.
Lock modeDescription
Shared (S)Used for operations that do not change or update data (read-only operations), such as a SELECT statement.
Update (U)Used on resources that can be updated. Prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later.
Exclusive (X)Used for data-modification operations, such as INSERT, UPDATE, or DELETE. Ensures that multiple updates cannot be made to the same resource at the same time.
IntentUsed to establish a lock hierarchy. The types of intent locks are: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).
SchemaUsed when an operation dependent on the schema of a table is executing. The types of schema locks are: schema modification (Sch-M) and schema stability (Sch-S).
Bulk Update (BU)Used when bulk-copying data into a table and the TABLOCK hint is specified.

Shared Locks
Shared (S) locks allow concurrent transactions to read (SELECT) a resource. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the data has been read, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
Update Locks
Update (U) locks prevent a common form of deadlock. A typical update pattern consists of a transaction reading a record, acquiring a shared (S) lock on the resource (page or row), and then modifying the row, which requires lock conversion to an exclusive (X) lock. If two transactions acquire shared-mode locks on a resource and then attempt to update data concurrently, one transaction attempts the lock conversion to an exclusive (X) lock. The shared-mode-to-exclusive lock conversion must wait because the exclusive lock for one transaction is not compatible with the shared-mode lock of the other transaction; a lock wait occurs. The second transaction attempts to acquire an exclusive (X) lock for its update. Because both transactions are converting to exclusive (X) locks, and they are each waiting for the other transaction to release its shared-mode lock, a deadlock occurs.
To avoid this potential deadlock problem, update (U) locks are used. Only one transaction can obtain an update (U) lock to a resource at a time. If a transaction modifies a resource, the update (U) lock is converted to an exclusive (X) lock. Otherwise, the lock is converted to a shared-mode lock.
Exclusive Locks
Exclusive (X) locks prevent access to a resource by concurrent transactions. No other transactions can read or modify data locked with an exclusive (X) lock.
Intent Locks
An intent lock indicates that SQL Server wants to acquire a shared (S) lock or exclusive (X) lock on some of the resources lower down in the hierarchy. For example, a shared intent lock placed at the table level means that a transaction intends on placing shared (S) locks on pages or rows within that table. Setting an intent lock at the table level prevents another transaction from subsequently acquiring an exclusive (X) lock on the table containing that page. Intent locks improve performance because SQL Server examines intent locks only at the table level to determine if a transaction can safely acquire a lock on that table. This removes the requirement to examine every row or page lock on the table to determine if a transaction can lock the entire table.
Intent locks include intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).
Lock modeDescription
Intent shared (IS)Indicates the intention of a transaction to read some (but not all) resources lower in the hierarchy by placing S locks on those individual resources.
Intent exclusive (IX)Indicates the intention of a transaction to modify some (but not all) resources lower in the hierarchy by placing X locks on those individual resources. IX is a superset of IS.
Shared with intent exclusive (SIX)Indicates the intention of the transaction to read all of the resources lower in the hierarchy and modify some (but not all) resources lower in the hierarchy by placing IX locks on those individual resources. Concurrent IS locks at the top-level resource are allowed. For example, an SIX lock on a table places an SIX lock on the table (allowing concurrent IS locks), and IX locks on the pages being modified (and X locks on the modified rows). There can be only one SIX lock per resource at one time, preventing updates to the resource made by other transactions, although other transactions can read resources lower in the hierarchy by obtaining IS locks at the table level.

Schema Locks
Schema modification (Sch-M) locks are used when a table data definition language (DDL) operation (such as adding a column or dropping a table) is being performed.
Schema stability (Sch-S) locks are used when compiling queries. Schema stability (Sch-S) locks do not block any transactional locks, including exclusive (X) locks. Therefore, other transactions can continue to run while a query is being compiled, including transactions with exclusive (X) locks on a table. However, DDL operations cannot be performed on the table.
Bulk Update Locks
Bulk update (BU) locks are used when bulk copying data into a table and either the TABLOCK hint is specified or the table lock on bulk load table option is set usingsp_tableoption. Bulk update (BU) locks allow processes to bulk copy data concurrently into the same table while preventing other processes that are not bulk copying data from accessing the table.

Wednesday, 16 July 2014

OUTPUT clause in INSERT/UPDATE/DELETE statements

How do I use an INSERT statement's OUTPUT clause to get the identity value?



INSERT INTO URLDETSAMP(Name)
OUTPUT INSERTED.URLID
VALUES ('URLURLURLURLURLURLURL')

SELECT * FROM URLDETSAMP
ORDER BY URLID DESC




SELECT * FROM URLDETSAMP
ORDER BY URLID DESC

update URLDETSAMP SET Name='test'
output inserted.name  INTO URLDETSAMP(name)
WHERE URLID=47

SQL Server 2005 introduces a new TSQL feature that allows you to retrieve data affected by insert/update/delete statements easily. This is achieved by the use of OUTPUT clause which can reference columns from the inserted and deleted tables (that are available from triggers currently) or expressions. The OUTPUT clause can be used to return results to the client or consume it on the server into a temporary table or table variable or permanent table. More details on the restrictions of the OUTPUT clause and usage can be obtained from the SQL Server Books Online documentation.
Let us take a look at a common scenario now and how OUTPUT clause can be used to solve the problem. Use of identity column as primary key in a table is a fairly common practice. Additionally, if you have references to the identity column you need to know the value explicitly before inserting the related row. This scenario gets complicated if multiple identity values are generated as a result of inserting multiple rows. In this case, there is no easy way to determine the values that were inserted without using a trigger to populate a temporary table with the generated identity values from the inserted table for example. To demonstrate this scenario, we can first see a trigger based implementation that works in SQL Server 2000 and 2005:
use tempdb;
go
create table itest ( i int identity not null primary key, j int not null unique );
go
create trigger insert_itest on itest after insert
as
begin
    insert into #new ( i, j )
    select i, j
      from inserted;
end
go
create table #new ( i int not null, j int not null );
insert into itest ( j )
select o.id from sysobjects as o;
-- Newly inserted rows and identity values:
select * from #new;

-- #new can be used now to insert into a related table:
drop table #new, itest;
go

This code can be re-written in SQL Server 2005 using the OUTPUT clause like below:

create table itest ( i int identity not null primary key, j int not null unique )
create table #new ( i int not null, j int not null)
insert into itest (j)
output inserted.i, inserted.j into #new
select o.object_id from sys.objects as o
select * from #new
drop table #new, itest;
go
Now from this example, you can see the integration of OUTPUT clause with existing DML syntax.
Another common scenario is auditing of data in a table using triggers. In this case, the trigger uses information from the inserted and updated tables to add rows into the audit tables. The example below shows code that uses OUTPUT clause in UPDATE and DELETE statements to insert rows into an audit table.
create table t ( i int not null );
create table t_audit ( old_i int not null, new_i int null );
insert into t (i) values( 1 );
insert into t (i) values( 2 );
update t
   set i  = i + 1
output deleted.i, inserted.i into t_audit
 where i = 1;
delete from t
output deleted.i, NULL into t_audit
 where i = 2;
select * from t;
select * from t_audit;
drop table t, t_audit;
go
Hope you have now got an understanding of the OUTPUT clause in SQL Server 2005. It is a powerful feature that enables you to eliminate use of triggers in some cases or send results to client as part of the data modification operation efficiently.

Thursday, 3 July 2014

Components of Data flow?

Components includes -
  1. Data source(s).
  2. Transformations.
  3. Destination(s).
Component 1 -  Data Flow Sources
Data Flow SourcesDescription
OLE DB SourceConnects to OLE DB data source such as SQL Server, Access, Oracle, or DB2.
Excel SourceReceives data from Excel spreadsheets.
Flat File SourceConnects to a delimited or fixed-width file.
Raw File SourceDo not use connection manager. It produces a specialized binary file format for data that is in transit.
XML SourceDo not use connection manager. Retrieves data from an XML document.
ADO.NET SourceThis source is just like the OLE DB Source but only for ADO.NET based sources.
CDC SourceReads data out of a table that has change data capture (CDC) enabled. Used to
retrieve only rows that have changed over duration of time.
ODBC SourceReads data out of table by using an ODBC provider instead of OLE DB.

Transformation CategoriesTransformations
Row TransformationsCharacter Map
Copy Column
Data Column
Derived Column
OLE DB Command
 Rowset Transformations   Aggregate
 Sort
 Pivot/Unpivot
 Percentage sampling/Row sampling
 Split and Join Transformations     Conditional split
 Look up
 Merge
 Merge join
 Multicast
 union All
 Business intelligence transformations     Data Mining Query
 Fuzzy Look Up
 Fuzzy Grouping
 Term Extraction
 Term Look up
 Script Transformations Script
 Other Transformations     Audit
Cache Transform
 Export Column
 Import Column
 Row Count
Component 3 Data Flow Destinations
Data Flow DestinationsDescription 
 ADO.NET DestinationExposes data to other external processes such as a .NET application.
 Data Reader DestinationAllows the ADO.NET Data Reader interface to consume data, similar to the ADO.NET Destination.
 OLE DB DestinationOutputs data to an OLE DB data connection like SQL Server, Oracle or Access.
 Excel DestinationOutputs data from the Data Flow to an Excel spreadsheet.
 Flat file DestinationEnables you to write data to a comma-delimited or fixed-width file.
 Raw file DestinationOutputs data in a binary format that can be used later as a Raw File Source. It’s usually used as an intermediate persistence mechanism.
 ODBC DestinationOutputs data to an OLE DB data connection like SQL Server, Oracle or Access.
 Record set DestinationWrites the records to an ADO record set. Once written, to an object variable, it can be looped over a variety of ways in SSIS like a Script Task or a Foreach Loop Container.
 SQL Server DestinationThe destination that you use to write data to SQL Server. This destination has many limitations, such as the ability to only write to the SQL Server where the SSIS package is executing. For example – If you’re running a package to copy data from Server 1 to Server 2, then the package must run on Server 2. This destination is largely for backwards compatibility and should not be used.