Showing posts with label SSIS Package. Show all posts
Showing posts with label SSIS Package. Show all posts

Saturday, 3 November 2012

How to Remove Duplicate Rows but Log the Duplicates in SSIS

Create table D
( A int,B int)

Insert D values  (1,23)
Select * from D

A    B
1    23
1    23
16    59
12    12
13    45
12    12
45    56






Wednesday, 31 October 2012

Lookup,Incremental load(Latest Records),removedup











How TO EXECUTE dowloded pacakage




Write data from multiple tables to single flat file.

Take 3 DataFlow Tasks in Control Flow

and One Flat File Destination
(click on the image for larger view)











Tuesday, 30 October 2012

Handling extra/missing columns in Delimited Flat file

Delimited Flat files (CSV) are one of the important non-relational source for SSIS ETL but if we get robust file all the time then we can use inbuild flat file source in SSIS. Surprisingly we are not that much lucky and get the data in which columns are missing (or we can say extra columns) like as follow...

Flat File INPUT


we Want Flatfile OUTPUT


  1. Flat file source which reads all data into one column and then splits them based on index of the comma in a derived column.
Here I'm giving you the demonstration to handle such file with the help of derived column.We have to write expression (sometime complex) to get the proper data in column format. In the above example I am getting all the data as String and after splitting we are saving it as string only (If you want to change the data type then you can use Data Conversion task). If number of columns are more then this approach is not recommended, In that case you should go with the first 2 approach.

Step1: Inside the Data Flow Task (DFT), Select a Flat File Source and Derived column.


Step2: Configure the FlatFile Source in such a way that you will get all input data in single column.

Note:  We are keeping the Column name as "C" in the example.

Step3: Double click the derived column and use the following logic.



  • For separating the column "C1" from the column "C", we need to fetch the data from position 1 to first occurrence of delimiter " " so finding the first delimiter we have to use FINDSTRING SSIS function (Returns the location of the specified occurrence of a string within a character expression).
  • SUBSTRING(C,1,FINDSTRING(C,",",1) - 1)
  • Here we have to do "-1" because we have to exclude delimiter from the column value.
  • For separating the column "C2" from the column "C", we have to fetch the data after the first delimiter to 2nd delimiter but before that we have to check whether we have 2nd delimiter in place

  • FINDSTRING(C,",",2) != 0 ? SUBSTRING(C,FINDSTRING(C,",",1) + 1,FINDSTRING(C,",",2) - FINDSTRING(C,",",1) - 1) : SUBSTRING(C,FINDSTRING(C,",",1) + 1,LEN(C) - FINDSTRING(C,",",2))  
  • For separating the column "C3" from the column "C"
  • FINDSTRING(C,",",2) == 0 ? "NULL" : (FINDSTRING(C,",",3) != 0 ? SUBSTRING(C,FINDSTRING(C,",",2) + 1,FINDSTRING(C,",",3) - FINDSTRING(C,",",2) - 1) : "NULL")

  • For separating the Last column "C4" from the column "C",we have to fetch the data after the third delimiter to end of the row. and for calculating the last position of the row, we have to use LEN SSIS function
  • FINDSTRING(C,",",3) == 0 ? "NULL" : SUBSTRING(C,FINDSTRING(C,",",3) + 1,LEN(C) - FINDSTRING(C,",",2))
Step4: And result would be as expected..





Thursday, 18 October 2012

SSIS Package

Introduction


EmpID EmpName DOB DOJ Salary
1 User1 1/1/1976 1/4/2000 20000
2 User2 1/2/1976 1/5/2000 20000
3 User3 1/3/1976 1/6/2000 20000
4 User4 1/4/1976 1/7/2000 30000
5 User5 1/5/1976 1/8/2000 20000
6 User6 1/6/1976 1/9/2000 40000
7 User7 1/7/1976 1/10/2000 20000
8 User8 1/8/1976 1/11/2000 35000
9 User9 1/9/1976 1/12/2000 20000
10 User10 1/10/1976 1/6/2000 20000

Steps to Create SSIS Package

1. Open business intelligence development studio.
2. Click on File-> New -> Project.
3. Select Integration service project in new project window and give the appropriate name and location for project. And click ok.
image1
4. The new project screen contains the following:
  1. Tool Box on left side bar
  2. Solution Explorer on upper right bar
  3. Property Window on lower right bar
  4. Control flow, data flow, event Handlers, Package Explorer in tab windows
  5. Connection Manager Window in the bottom
5. Right click on the Connection Manager Tab, click on new FLAT File Connection Menu Item.
6. Connection manager editor opens up which contains 4 tabs, General, Columns, Advanced and Preview.
  1. In General Tab, enter connection manager name and description (optional). Select source file, file format and delimiter. If first row of source file contains headers, then select the checkbox “Column names in the first data row".
  2. Select Column tab and check whether all columns are properly mapped or not.
  3. Select advance tab. Here you can add, remove or modify columns as per output stream requirement.
  4. Select preview tab to check how your output will look like:
7. Click on OK. It will create a flat file connection manager for your source file.
8. Now Drag Data Flow Task from the Toolbox into the Control Flow Container.
9. Double Click on the Data Flow Task. It will show Data flow Container tab for selected Data Flow Task. You can see three item categories in Toolbox.
  1. Data flow sources - Source makes data from different external data sources available to the other components in the data flow.
  2. Data flow transformations - Transformations can perform tasks such as updating, summarizing, cleaning, merging, and distributing data.
  3. Data flow destinations - Destination writes the data from a data flow to a specific data store, or creates an in-memory dataset.
10. Drag a Flat file Source Component from Data Flow Sources into Data Flow Container window.
11. Double Click on the Flat File Source Component, it will display flat File source Editor. The window contain three tabs:
  1. Connection Manager - Here we will specify source connection manager which we created for source file. If source file contains null values, select “Retain null values from Source as null values in the data flow” checkbox.
  2. Columns -This tab allows the user to select required output columns and user can also change the output column names.
  3. Error Output - Using this tab, the user can decide the behavior of the component in case of failure. There are three options:
    1. Ignore Failure: Selecting this will ignore any failure while reading rows from source and the package will continue executing even any error occurred.
    2. Redirect Row: Selecting this will redirect the failed rows to other component which is connected with the error precedence constraints.
    3. Fail component: Selecting this will stop the execution of package in case of failure.
12. Drag and drop a Conditional Split Component from Data Flow Transformations Tab into Data flow Task Container window. Drag and Connect the success output (which is shown by Green arrow) of Flat File Source Component to Conditional Split Component.
13. Double click on the Conditional Split component, it will open Conditional Spilt Component editor window. Here user can specify the condition(s) as per the requirement and click ok. For example:
  • HigherSalary: [Salary] > 20000 (Redirect records if salary is greater than 20000)
  • LowerSalary: For rest of the records
14. Drag and drop Excel Destination component from Data Destinations tab into Data Flow Task Container. Connect the success arrow of the Conditional Split Component to Excel Destination, Input Output selection window will be popped up.
15. In the Input Output Selection popup window, select appropriate conditional output for example “HigherSalary” conditional output and click ok.
16. Double Click on the Excel Destination Component which will open Excel Destination Editor Window. Click on the new Ole DB Connection Manager Button.
17. Select Destination File location and the appropriate Excel Sheet Name where you want to insert the success output data with salary values higher than 20000.
18. Click on the mapping tab and map the appropriate input columns with output columns.
19. On click of OK, error icon is shown in the destination excel file component and it displays the message “Cannot Convert between Unicode and Non-Unicode string data types”.
20. To resolve this issue, we need to insert a Data Conversion Transformation Component between Conditional Split and Excel Destination Component.
21. Double Click on the Data Conversion Component, it will open Data Conversion Transformation editor. Using this component, convert input data types to required output data types.
22. Click OK and connect the success arrow of Data Conversion component into Excel Destination Component. Double click on Excel Data Conversion Component and click on the mapping tab and map the output of Data Conversion Component to input of Excel Destination Component and click ok.
23. Rename the Excel Destination Component as “Records with Salary > 20000
24. Now add one more Data Conversion Transformation Component and connect the second success output of Conditional Split to it. Do the necessary data type conversions. Add one more Excel Destination Component and rename it as “Remaining Records”. Create a new connection manager and configure it to point to the second Output File. Connect the input of the newly added Data Conversion Component to it and do the mapping as required.
25. Now the package is ready to be executed. Go to the Solution Explorer and right click on the package and select “Execute Package". If all components turn “GREEN”, it means package has run successfully, if there is any error, the component which has failed to execute will be shown in “RED” Color. We can see the package execution steps in the “Progress” tab.
26. Once you run the package, data will be saved in the destination output files as per the condition specified in the Conditional Split Component.