Tuesday, June 15, 2010

Looping Through CSV Files using SSIS

Looping Through CSV Files
This program will loop through a folder and its sub folders, find all the *.CSV files and retrieve data from each file and inserts the record to a destination table and finally moving the file to a backup folder after each loop cycle.

Criteria
1. The file format must be in a *.CSV format (Flat File)
2. A sample source file of the CSV file (Please renamed to SampleCSVFile.csv)
3. A Folder that will be the main folder of the ETL (in this case it will be named => C:\SSIS\LoopingThroughCSVFiles)
4. Two other sub folders 1- “Sample” 2- “ToBeProcessed”
5. Copy the CSV Sample File in the “Sample Folder”
6. All headers must be on the first row of each CSV file
7. No empty/missed rows are accepted, remove them all

Over all View



Defining some variables in the package
One of the most important part of a SSIS package are the variables. Because some variables will be defined in the Configuration File(s), so that you can use the package dynamically,
Example 1: The server name on your computer is different that the server name on production server so basically you can have the “server name variable” defined in the XML or table configuration so that it can be changes when deploying the package from one server to another.



1. “uVar_BackupActiveFolder” Each time the package runs it makes a folder in the backup folder this folder will be known as the current backup folder which will be used to save the CSV backup files and/or maybe the Error text file.
2. “uVar_BackupMainFolderPath” Some customers prefer to have the backup folder in another server so basically we allow them to have a choice, by adding this variable (See the script section for more details). If this variable is empty it will make a default backup folder name (See backup scripts).
3. “uVar_DestinationDBName” The name of the Destination Database.
4. “uVar_DestinationServerName” The name of the Destination SQL Server.
5. “uVar_DestinationTableName” The name of the Destination Table in SQL Server.
6. “uVar_ErrorOutPutFile” This is the path and file name of the Error file, each time this path and file name changes so that we can keep a backup of each error log file.
7. “uVar_FolderNameOfFilesToBeProcessed
8. “uVar_MainFolderPath” This is the path where you will be having your main ETL; it can contain the Package, configuration files, backup, sample folder, ToBeProcessed folder and etc... (C:\SSIS\LoopingThroughCSVFiles)
9. “uVar_NumberOfExtractedRows” A row counter that won’t be used in this example.
10. “uVar_SMTPEmailServerName” The SMTP name or IP address of it
(e.g. 1: Neptune e.g. 2: mars.com e.g. 3: 10.2.10.56)
11. “uVar_SMTPSendEmailsTo_FromError” A List of email address to email the error file. (Snikkhah@Live.ca ; YourEmail@YourBox.Com )
12. “uVar_SourceConStr_DataSource” A variable that will have the file name and folder (See the loop property).
13. “uVar_SourceFileExtention” This variable will define the file extension (e.g. *.csv).
14. “uVar_SourceFileNameRenamed” Before moving a file to the backup folder we will have to rename and set the right backup folder in this variable.

Defining Connection Manager
We need 4 connections, in this example I will not be setting the error and email section you can check my blog for that
1. CSV Flat File Connection Manager
The setting in this section depend on the format of the source CSV file, you may have a semicolon (;) as a row delimiter or a tab as a column delimiter, every format has its own settings. In my case the settings are...





2. FF_DST----Error Output file for debugging

3. SQLServerDestinationConnection


4. SMTP Connection Manager For Error messages

Adding Object (controls) in the Control Flow

Now we can add the objects as seen in the “Over all View
1. SCR---- Making the "ToBeProcessed Folder"


In this section the customer can redirect the source folder by entering the folder full path name in the “uVar_FolderNameOfFilesToBeProcessed” variable, if not the system will pick up the default name “ToBeProcessed”.

2. SCR---- Making the backup folder string from BackupFolderName Variable (Dynamic Backup folder yyyy-mm-dd-hh-mm-ss)




Same settings again the customer can define the backup folder in the variable “uVar_BackupMainFolderPath”, if not the Package will pick the default folder name “BackUpFolder”. The other thing that happens in this script is that the current backup folder name string is been made, e.g. C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

3. FSYS----Creating the backup folder from BackupFolderName


This will make the backup folder from the “uVar_BackupActiveFolder” which will be C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

4. SCR--- Make the correct path and file name for the Error Text File




What I am doing is making a file name string that has, the date and time, ETL Name, etc... This will be needed it an error accurse

5. SQL----Truncate the destination Table


6. FLC - Looping through each CSV file - Finding the Data Source


By using the variable “uVar_FolderNameOfFilesToBeProcessed” the loop will be loop through the folder and sub folder to find *.csv files “name and folder” and each file folder and name will be saved in the “uVar_SourceConStr_DataSource” e.g. C:\SSIS\LoopingThroughCSVFiles\ToBeProcessed\AnyCSVFileName.csv

7. DFT---- MAIN ETL (E and T Only)


8. FF_SRC---- Connecting to CSV file to extract Data

Setting the right connection object for the Flat File data source object


9. OLE_DST----SQL SERVER Destination table

Setting the destination table in SQL server for the OLE destination object.


10. SCR--- Rename file names with the format of "yyyy-mm-dd-hhmmssff"


Making a string that will rename the current file that has been processed and save it in the current backup folder


11. FSYS---- Moving processed files to the "Backup Active Folder"

Moving the processed file to that final backup folder

12. End of Loop

Testing Scenarios

Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.

Scenario 1:
1- Copy CSV files to the “ToBeProcessed” folder
2- Run the package
3- Check the destination table
4- Check the backup folder and find the current date time folder and see if the file have been moved from the “ToBeProcessed” folder to the new current backup folder




9 comments:

  1. Do you have the sample package you are describing for download?

    ReplyDelete
  2. Hi Shahriar,

    Can I also get a copy of this dtsx package...

    Thanks in Advance,
    Lsnnaidu@gmail.com

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi Shahriar,

    thanks a lot for this post, it was very useful.

    ReplyDelete
  5. Hi Shahriar .. would be grateful if you could provide the file to me as well .. aarionsql@gmail.com

    ReplyDelete
  6. Hi Shahriar,
    Thanks for this very useful post. I was wondering if you still have a copy of this dtsx that you can share with me. Please send it to eatierra@gmail.com. Thanks in advance.

    ReplyDelete
  7. Hi Shahriar,

    Happy New Year!! Do you have this package still? If so, please send me a copy to deepthishreeya1@gmail.com. Thanks a lot!!

    ReplyDelete
  8. Hi Shahriar,
    Do you still have this package ?
    Could be possible to get a copy to perezcotai@gmail.com

    ReplyDelete
  9. Hi Shahriar,

    Your package is very popular!

    Can I have a copy too felsgrin@gmail.com?

    Thank you

    ReplyDelete