Delete duplicate rows with no primary key on a SQL Server table
Every once in awhile a table gets created without a primary key and
duplicate records get entered. The problem gets even worse when you
have two identical rows in the table and there is no way to distinguish
between the two rows. So how do you delete the duplicate record?
SolutionOne option that SQL Server gives you is the
ability to set ROWCOUNT which limits the numbers of records affected by
a command. The default value is 0 which means all records, but this
value can be set prior to running a command. So let's create a table
and add 4 records with one duplicate record.Create a table called duplicateTest and add 4 records.
CREATE TABLE dbo.duplicateTest
( [ID] [int] , [FirstName] [varchar](25), [LastName] [varchar](25)
) ON [PRIMARY] INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')
If we select all data we get the following:SELECT * FROM dbo.duplicateTest
ID | FirstName | LastName |
1 | Bob | Smith |
2 | Dave | Jones |
3 | Karen | White |
1 | Bob | Smith |
SELECT * FROM dbo.duplicateTest WHERE ID = 1 AND FirstName = 'Bob' AND LastName = 'Smith'
We still get 2 rows of data:
ID | FirstName | LastName |
1 | Bob | Smith |
1 | Bob | Smith |
SELECT * FROM dbo.duplicateTest SET ROWCOUNT 1 DELETE FROM dbo.duplicateTest WHERE ID = 1 SET ROWCOUNT 0 SELECT * FROM dbo.duplicateTest
With SQL Server 2005 we can also use the TOP command when we issue the delete, such as the following. Note: the select commands are just used to show the data prior and after the delete occurs.
SELECT * FROM dbo.duplicateTest
DELETE TOP(1) FROM dbo.duplicateTest WHERE ID = 1 SELECT * FROM dbo.duplicateTest
So as you can see with SQL Server 2005 there are two options to allow you to delete duplicate identical rows of data in your tables.
Here is one note from Microsoft about using SET ROWCOUNT:
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. |
- Take a look how the ROWCOUNT command can be used to affect the results of your query
- Also take a look at the TOP command and changes that have been implemented with SQL Server 2005
- Start using TOP instead of ROWCOUNT for SQL Server 2005 and later
Release | History |
May 24, 2007 | Here is an
updated way to delete a variable set of duplicate rows using the TOP
command. The data below represents a sample data set where the same
value appears multiple times, but there is no primary key to only delete
all but one record. EmpID Aftab Aftab Aftab Aftab Aftab BaratBarat Barat Barat JohnneyJohnney Johnney Johnney Johnney Johnney If you use the following command this will get a count of how many rows there are and delete this minus one record. DELETE TOP (SELECT COUNT(*) -1 FROM dbo.Emptest WHERE EmpID = ' Barat') FROM dbo.Emptest WHERE EmpID = ' Barat'
|
No comments:
Post a Comment