Archive

Archive for the ‘SQL Server’ Category

Remove duplicates from table using SQL Rank() function

August 27th, 2009 No comments

Having to de-duplicate rows from source tables is a very common task in the data integration world, we were trying to do all different of Tsql to filter out the duplicate values. We needed to pick the row based upon a certain criteria. On the internet i found the TSQL function Rank()

It works like this:

No matter where the source data is stored, as a part of the ETL workflow, we will stage source data in a SQL Server 2005 table.

This is the staging table:

create table SourceWithDuplicates
(SurrogateKey int identity(1,1),
Code varchar(12),
Description varchar(20),
OtherAttribute1 int,
OtherAttribute2 Varchar(10),
TieBreakerColumn Varchar(12))

As a general rule I always add a surrogate key column to my staging tables (that makes some DBAs happy); an in this case it would actually help in the de-duplication process.

Some sample data:

Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other1′,’1′)
Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other1′, ’1′)
Insert into SourceWithDuplicates values(‘Abc’, ‘Row Description 1′, 1, ‘Other2′,’2′)\
Insert into SourceWithDuplicates values(‘Def’, ‘Row Description 2′, 1, ‘Other4′,’a')
Insert into SourceWithDuplicates values(‘Def’, ‘Row Description 2′, 1, ‘Other5′,’a')
Insert into SourceWithDuplicates values(‘Ghi’, ‘Row Description 3′, 1, ‘Other5′,’a')
Insert into SourceWithDuplicates values(‘Jkl’, ‘Row Description 4′, 1, ‘Other5′,’a')

The requirements:

Code and description values need to be unique in the target table; in case they are duplicates, choose the row with the highest value in the tie breaker column. If the value in the tiebreaker column is also duplicated; then choose any of the rows.

So the data looks like:

SurrogateKey Code Description OtherAttribute1 OtherAttribute2 TieBreaker ———— ———— ——————– ————— ————— ———-
1 Abc Row Description 1 1 Other1 1
2 Abc Row Description 1 1 Other1 1
3 Abc Row Description 1 1 Other2 2
4 Def Row Description 2 1 Other4 a
5 Def Row Description 2 1 Other5 a
6 Ghi Row Description 3 1 Other5 a
7 Jkl Row Description 4 1 Other5 a

Following the requirements, only 4 rows kept should be chosen; those with surrogate keys:2 3, 4 or 5, 6 and 7
This is the query using the Rank() function that resolve our issue:

Select *,
rank() over (Partition by Code, description order by TieBreakerColumn desc, SurrogateKey) MyRank
from SourceWithDuplicates

and this is its output:

Surrogatekey Code Description MyRank
———— ———— ——————– ——————–
3 Abc Row Description 1 1
1 Abc Row Description 1 2
2 Abc Row Description 1 3
4 Def Row Description 2 1
5 Def Row Description 2 2
6 Ghi Row Description 3 1
7 Jkl Row Description 4 1

Notice that if we filter the result from the previous query using WHERE MyRank=1 we will get the rows to be kept.

Very nice ! I  found this at:

http://rafael-salas.blogspot.com/2007/04/remove-duplicates-using-t-sql-rank.html

Categories: SQL Server Tags: ,

Schedule SQL Query output to file, the simple way

August 19th, 2009 2 comments

Today two colleagues wanted to schedule the output of a SQL command to a text file so they could check some values on a daily base. After some clicking in SQL Server i found a very simple solution,

I created a SQL server Job Step which executed the SQL command

crop

Then went to the Advanced tab and selected an output file to where the data can be written (even appended):

step

Very simple way to log query output to a file at a regular interval.

Categories: SQL Server Tags:

SQL Server 2008 R2 CTP released

August 11th, 2009 No comments

Today we are excited to announce the availability of the first community technology preview (CTP) of Microsoft SQL Server 2008 R2 for MSDN and TechNet subscribers, with the CTP generally available on Wednesday, August 12th.

Building on the momentum of SQL Server 2008, “R2” improves IT efficiency by reducing the time and cost of developing and managing applications; empowers end users to make better decisions through Self-Service Business Intelligence and enables organizations to scale with confidence by providing high levels of reliability, security and scalability for business critical applications.

This public preview offers the opportunity to experience early, pre-release feature capabilities including:

  • Application and Multi-server Management
  • SMP scale up with support for up to 256 logical processors
  • Report Builder 3.0 with support for geospatial visualization

This CTP provides the first opportunity to explore some of the features of SQL Server 2008 R2 and see how it all comes together to enhance performance and scalability, enable self-service BI and improve IT and developer efficiency. The CTP process is also a great way for you to give us feedback on the new technologies.

Read more at: http://blogs.technet.com/dataplatforminsider/archive/2009/08/10/download-sql-server-2008-r2-august-ctp-today.aspx

Categories: SQL Server Tags:

SQL Find Last Day of Any Month

July 10th, 2009 No comments

I needed to determine tha last day of previous month in a view, i found this on internet and works great:

Following script demonstrates the script to find last day of previous, current and next month.
----Last Day of Previous Month
SELECT DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE()),0))
LastDay_PreviousMonth
----Last Day of Current Month
SELECT DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE())+1,0))
LastDay_CurrentMonth
----Last Day of Next Month
SELECT DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE())+2,0))
LastDay_NextMonth

found this at http://blog.sqlauthority.com/2007/08/18/sql-server-find-last-day-of-any-month-current-previous-next/

Categories: SQL Server Tags:

SQL Server 2008: Best Practice Design Alerts warnings

June 2nd, 2009 No comments

New in Sql Server 2008 are the integrated Best Practice Design shown as warnings, a lot of times you want to ignore these warnings because you know it isn’t applicable in your situation.

Before you deploy your project, you should clean up – to the extent you wish – the current warnings in your project. Warnings won’t stop your deployment, but you should make a conscious decision whether to ignore the surfaced warnings. A comprehensive list of all warnings in your project can be found in the Error List window (which you can open with Ctrl+E). Double-click on an error to access the designer and fix the problem. Alternatively, you can right-click the error and click dismiss to clear it off the list if you don’t intend to fix it. You can even add a comment to document your reason for ignoring this error. This method of clearing the error is instance-based and will not clear the same error if it’s found in a different dimension or cube. To globally dismiss a particular type of error, whether proactively before you start development or after the fact, you can access the new Warnings tab in the Database editor (which you can open on the Database menu by clicking Edit Database). Incidentally, the Warnings tab also contains a list of the warnings dismissed individually and the related comment.

Found at at: http://www.sqlskills.com/BLOGS/STACIA/post/SQL-Server-2008-Best-Practice-Design-Alerts.aspx

Categories: BI Technical, SQL Server Tags: ,

How to Clear SQL Server Cache When Performing Benchmark Tests

May 28th, 2009 No comments

I was testing the perfomance of a query from different perspectives and wanted to make sure SQL Server didn’t cache Query results, to achieve this:

run DBCC DROPCLEANBUFFERS, which clears all data from the cache. Then run DBCC FREEPROCCACHE, which clears the stored procedure cache.
 

found at:

http://www.devx.com/tips/Tip/14401

Categories: SQL Server Tags:

Microsoft announced CEP, real time BI on the horizon ?

May 14th, 2009 No comments

Microsoft announced earlier this week that a CEP/stream processing product will be included in SQL 2008 R2Complex Event Processing, or CEP, is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes.

Microsoft called out four reasons to me why CEP might be needed in addition to ordinary database processing. Two are the standard reasons for data reduction:

1. Without CEP, you can’t bang the data into the database fast enough.
2. You don’t want to keep most of the data past a short time window anyway.
The other two are also fairly standard reasons for using CEP:
3. Standard SQL isn’t all that great for time series anyway.
4. CEP use cases often call for incremental processing and/or parameterization of queries, something CEP engines are commonly better designed for than are DBMS.

However, Microsoft seems to be taking a somewhat different approach to time-based SQL extensions than some other vendors. To quote email Microsoft sent today:

Microsoft Research (MSR) introduced the temporal extensions to relational algebra based upon a notion of application time that is independent of system time. It matters when the event originated instead of when they arrived at the processing system. Further it treats each event as being associated with an interval of time as opposed to a point in time. This helps in modeling certain real life phenomenon naturally. [StreamBase et al.] also reason about multiple streams. Both the approaches are extensions to relational algebra. The MSR approach took the algebra as the starting point while StreamBase took an existing language over the algebra – SQL as the starting point. The MSR approach consequently avoids having to rework other elements of the SQL surface. The primary language extensions through which this algebra will be exposed initially is LINQ.

What are the implications of this? Can we use the CEP algorithm to monitor real time data from the cloud and extract only the necessary data to our datawarehouse ? or am i going to far with this ?

Found at: http://www.dbms2.com/2009/05/13/microsoft-announced-cep-this-week-too/

Categories: Cloud, SQL Server Tags: , , ,

Handling and Logging Sproc errors with SSIS

May 12th, 2009 No comments

I frequently use SSIS to schedule some maintence tasks using stored procedures. I want to use transactions to control my data and want to use SSIS to control my work flow. To accomplish this I use try..catch and Raiserror in the stored procedure and SSIS logging to keep track of the errors.

For example I use this sproc :

CREATE PROCEDURE MoveData
AS
INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

But I want to control the data so all or nothing is commited to the data, since SQL Server 2005 you can use the try..catch function and use transactions with that, so adding that maken:

CREATE PROCEDURE MoveData
AS
BEGIN TRANSACTION;
BEGIN TRY

INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

END TRY
BEGIN CATCH

 IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;

This stored procedure makes sure that when the insert of delete fails for any reason it will be rollbacked. But my sproc will just run and not return a error, my SSIS execute SQL task won’t know anything went wrong. That’s were RAISERROR comes in. Adding that to the sproc makes:

CREATE PROCEDURE MoveData
AS
BEGIN TRANSACTION;
BEGIN TRY

INSERT INTO History.ReportTable
SELECT * FROM Reporting.ReportTable
WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);
 
 DELETE FROM Reporting.ReportTable
 WHERE reportdate < cast(dateadd(DAY,-2,GETDATE())as date);

END TRY
BEGIN CATCH
  DECLARE @ErrorMessage NVARCHAR(4000);
    DECLARE @ErrorSeverity INT;
    DECLARE @ErrorState INT;

– Get the current error
SELECT @ErrorMessage = ERROR_MESSAGE(),
           @ErrorSeverity = ERROR_SEVERITY(),
           @ErrorState = ERROR_STATE();

    — Use RAISERROR inside the CATCH block to return
    — error information about the original error that
    — caused execution to jump to the CATCH block.
    RAISERROR (@ErrorMessage, — Message text.
               @ErrorSeverity, — Severity.
               @ErrorState — State.
               );
 IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;

You now can create your own error handling, whether in a workflow or use the SSIS logging to monitor and log your errors, like I do.

Categories: SQL Server, SSIS Tags: