Wednesday, 17 October 2012

An Introduction to SSIS

An Introduction to SSIS 

SQL Server Integration Services (SSIS) is a *really* powerful data transformation and import tool; it allows for all kinds of data manipulation, both between databases and within them.
The problem is that it’s not entirely intuitive; the learning curve is steep. But you really shouldn’t let that stop you from trying it out: once you’ve got the basics, it’s really quite accessible.
In this series of posts, we’ll do some basic, step-by-step data manipulation with SSIS, starting with importing data from a CSV file into a SQL Server 2008 database. We’ll move on to copying data between Oracle and SQL Server.

Importing data from a CSV file into SQL Server

SQL Server allows for a few ways to import data from a text file, most directly with the import wizard, which is a pretty accessible way to import uncomplicated data. This, by the way, uses SSIS in doing the import, though it wraps it all in a wizard.
SSIS, though, offers the most flexibility, both in terms of data source and destination, as well as what you’d like to do with the data as you’re copying it.

How do I get SSIS?

SSIS is included with SQL Server 2005 and 2008. It’s installed by default, and you can specify its installation specifically by selecting the “Integration Services” components of SQL Server when you’re doing the installation.
Note, by the way, that SSIS does not require a SQL Server instance in order to run.


How do I use SSIS?

Ah, here is where the rubber hits the road. The short answer is: you develop packages for SSIS to use with the SQL Server Business Intelligence Development Studio (whew!). We’ll call it BIDS from now on. Packages that you create are run either through the development studio or on a server with Integration Services installed; we’ll begin with a simple and uncomplicated import from a standard, unformly-formatted CSV file:

Importing Uniform Data using SSIS

Let’s begin with a CSV file that looks like this:
Last Name,First Name,Phone,Age
Lastname1,Firstname1,555-555-1111,25
Lastname2,Firstname2,555-555-1112,30
Lastname3,Firstname3,555-555-1113,55
Lastname4,Firstname4,555-555-1114,84
Lastname5,Firstname5,555-555-1115,22
Lastname6,Firstname6,555-555-1116,44
Lastname7,Firstname7,555-555-1117,66
Lastname8,Firstname8,555-555-1118,31
Lastname9,Firstname9,555-555-1119,30
Lastname10,Firstname10,555-555-1120,21
Let’s save this to a file named test.csv.

Launch BIDS

Click on the start button and browse to All Programs –> Microsoft SQL Server 2008 –> SQL Server Business Intelligence Development Studio.
 
image
When it launches, you’ll see a screen that looks something like the image to the right (click for a larger version).
In the top-left pane, you’ll see a few options (below). Click on the Create Project link to start a new project.
image
A new window will open, asking what kind of project you’d like to create. Select “Integration Services Project” and give it a name. You can choose a shorter path to which to save your project, as well. Click on OK to begin designing your project.
image

Create A Source Connection

When you click on OK, you’ll see a window like the one below. What we want to do is create a new connection, which is just another way of saying that we want to set up a data source. In our case, we’re going to use a CSV file, so right-click on the Connection Manager pane at the bottom and select “New Flat File Connection.”
image
That will bring up the following page. I’ve gone ahead and filled in the details.
image
Basically, we’re telling SSIS that we want to set up a data source using this file (test.csv). You can see that it’s (by default) set to parse the file as a “delimited” text file, which is what a CSV file is. Note that the first row in our file contains the names of the columns; as a result, we’ve checked “Column names in the first data row.” Leave that blank if your data doesn’t have column names in the first row.
Likewise, SSIS allows you to skip n rows before looking at the data. This is useful, especially, when using automated processes that insert a lot of commands or comments at the front of the file.
One other piece of this screen is very useful, such that it’s worth digressing just a bit: the “Text qualifier” field allows you to define a character that will set a field as a text string. This usually is a double quote. Lots of times you’ll run into a CSV that looks something like this:

You can take a look at the settings in this window; for our purposes, the defaults should work nicely. You can preview the data by clicking on the preview link on the left-hand side; this should show you the contents of our CSV file, broken down into a table.

Create a Destination Connection

Now that we’ve got the source defined, we need to do something similar for the destination. Right-click in the connection manager window again and select “New OLE DB Connection”. Click on the “New” button to create a new connection.
When you do, you’ll see a window like that below:
image
Fill in the server name and database name appropriately and click on OK. I’ve selected the adventureworks database. You’ll notice that there’s not a place here to name the connection; that’s OK, because it’ll be named SERVERNAME\.DBName.

Create a Data Flow Task

SSIS uses workflows to step through its process, and most of your work will be done in the data flow task. Note that the data flow task will contain other sub-tasks.
While in the Control Flow tab, mouse over the Toolbox in the top-left corner (highlighted below).
image
When you do, the Toolbox will open, as below:
image
Click on the “Data Flow Task” item, and drag it to the main screen. You’ll see an icon like that below. Double-click on that icon to open it.
image
When you double-click on the task, it appears that you have just erased it, but in reality, you’ve changed to the Data Flow tab; look at the top of the screen, and you’ll see that you’re no longer on the control flow tab. You can go back and forth between these as you choose.

Add the Source task

Now that we’re editing our data flow task, click on the toolbox again to drag the Flat File Source to the main screen. By default, it’ll be populated with the CSV connection we set up earlier.
image
One thing to note is the “Retain null values from the source as null values in the data flow.” This is disabled by default, which means that null values in the source will be treated as empty strings, instead.
You can click on the Preview button to double-check your file.

Create the Data Destination

Now that we’re reading the data, we need to insert it into the database. Click on the Toolbox again and scroll down to the Data Flow Destinations section. Drag the OLE DB Destination to the main window.
Before we can write any data, we have to tell SSIS how to map the data. In our example, we’re going to create a new table in the Adventureworks database, but the process for using an existing table is the same.
imageFirst, click on the Flat File source icon and drag the green arrow to the OLE DB Destination icon.
Next, double-click on the Destination icon to edit it: Notice that this also allows you to keep nulls. In our example, we’re going to create a new table, but if you have a table already created, you can select it from the table drop-down menu.
image
Clicking on the New… button will bring the the following screen:
image Change the table name in the SQL statement (I chose “importTest”). Of course, we’d also normally want to change the data types; the age field would be better as an integer field, and we wouldn’t normally want the phone number to be a 50-character field. We’ll leave it with defaults for our demonstration, however. Click on OK.
Finally, we need to set the mappings: click on the Mappings option on the left-hand side of the screen:
image
The defaults should be fine for our purposes, so you can click on OK.

Test it out

We should be good to go. Hit <F5> to run your package. You should see each step turn green as the data is read and written.
image

No comments:

Post a Comment