Friday, June 18, 2010

Chronological Order (List files by oldest first)

Chronological Order (List files by oldest first)
Extracting (listing) the files in a chronological order by, (this example “Modified Date”).

Over all View

Defining some variables in the package

1. “uVar_ArrayOfFilesProperties” An Object that will contain each files information.
2. “uVar_CurrentFileName” Name of the current file
3. “uVar_DateCreated” File creation date
4. “uVar_DateMofified” File creation modified
5. “uVar_FileExtention” i.e. “xls” for excel, “txt” for text.
6. “uVar_FileName” The file name only without the file extension.
7. “uVar_FirstTime” A flag for the flow of the package
8. “uVar_FolderNameOfTheFilesToBeProcessed” The main folder root of the files that need to be in a chronological order.
9. “uVar_Path” The path of each file
10. “uVar_PathFileName” The path, filename and extension all together
11. “uVar_SourceFileExtention” This variable will define the file extension (e.g. *.csv).

Adding Object (controls) in the Control Flow
Adding the objects as seen in the “Over all View”

1. FELC--Looping through files and folder
The first loop will find the file and process them one by one

2. SCR--Getting each files property information
The script will retrieve each files property; it’s like right clicking on a file and selecting properties.

The files properties information will be saved in the variables (See the script or check the variables with a red dot in the variable table).

3. DFT--List of sorted files by ModifiedDate
Over view of the DFT

In the DFT you will have
a. SCR--Appending the records in SSIS object variable with the current file info for the output records

You will have to use the ADD COLUMN option and add each Fields; you will have to select the right data type for each field

(Example for data type: DateModified = [DT_DBTIMESTAMP] , Path = [DT_WSTR] 3000, FileName = = [DT_WSTR] 200 , etc...)

All variable that contain the files information must be passed on to the Script component and the “User::uVar_ArrayOfFileProperties” variable.
The files properties information will be saved in the variables (See the script or check the variables with a red dot in the variable table).
• Scrip
The script is divided into 2 section, the first section is before the “IF” and the second section is within the “IF”, the first section is adding the current files information into the Output Buffer of the script component, the second part is looping through all the records in the SSIS object “uVarArrayOfFileProperties” and adding them to the Output Buffer of the script component, so the output result will always be the current files and the previous files information.
Public Overrides Sub CreateNewOutputRows()
With MainOutputrecordsBuffer
.PathFileName = Me.Variables.uVarPathFileName
.Path = Me.Variables.uVarPath
.FileName = Me.Variables.uVarFileName
.FileExtension = Me.Variables.uVarFileExtension
.DateCreated = Me.Variables.uVarDateCreated
.DateModified = Me.Variables.uVarDateModified
End With

If Me.Variables.uVarFirstTime = False Then

Dim oleDA As New Data.OleDb.OleDbDataAdapter
Dim dt As New Data.DataTable

Dim row As System.Data.DataRow

Dim strPathFileName As Integer
Dim strPath As Integer
Dim strFileName As Integer
Dim strFileExtension As Integer
Dim strDateCreated As Integer
Dim strDateModified As Integer

oleDA.Fill(dt, Me.Variables.uVarArrayOfFileProperties)

strPathFileName = dt.Columns("PathFileName").Ordinal
strPath = dt.Columns("Path").Ordinal
strFileName = dt.Columns("FileName").Ordinal
strFileExtension = dt.Columns("FileExtension").Ordinal
strDateCreated = dt.Columns("DateCreated").Ordinal
strDateModified = dt.Columns("DateModified").Ordinal

For Each row In dt.Rows
With MainOutputrecordsBuffer
.PathFileName = row(strPathFileName).ToString()
.Path = row(strPath).ToString()
.FileName = row(strFileName).ToString()
.FileExtension = row(strFileExtension).ToString()
.DateCreated = CType(row(strDateCreated).ToString(), DateTime)
.DateModified = CType(row(strDateModified).ToString(), DateTime)
End With

End If
End Sub


b. SRT--Sorting the list by "DateModified"
This is almost where you have to choose your chronological order , I have selected the modified date, why?

Do you think that you can find a files property that says the Create date is bigger that the modified date (Create = 2010-01-01 and Modified = 2007-01-01)? Yes you can, a file can be made on a computer on 2005, and be modified on the same computer on 2007, so far so good, and the same file can be pushed to a another computer on 2010-01-01 so on the second computer the files properties show that Create Date = 2010-01-01 and Modified Date= 2007-01-01.
c. RS_DST--Saving records in the SSIS objects
Now we have to save the sorted result to a SSIS Object variable.

Please remember to select all fields.

4. SCR--Setting a flag
Passing on the flag variable to set the flag to FALSE, the reason we do this is that, if you look at the script in the DFT you will see that the flag is been used because the SSIS Object does not have a structure format for the first time so we won’t be able to use the object (See in script “oleDA.Fill(dt, Me.Variables.uVarArrayOfFileProperties)”).

5. FELC--Looping the sorted file list
Setting the second file

This will loop through each record in the object one by one and save the value in the mentioned variables.

6. SCR-- Show the result in a MsgBox (For test only)
In the second loop I am sure that you will be having a DFT to do the main process in my case I just want to display the sorted order.

Note: Step 6 is only for testing only, please delete the object after testing.

Testing Scenarios
Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.

Scenario 1:
1- Select a folder that contains the files that you need to be sorted
2- Cut and past the folder into the SSIS variable “uVar_FolderNameOfFilesToBeProcessed”, this will point to that folder
3- Maybe you want to select a certain group of files like *.csv then you need to put “*.csv” in the variable “uVar_SourceFileExtention”
4- Run the package and you will see a MsgBox showing the files properties, check the files Modified date with the results of the MsgBox

Note: Step 6 is only for testing please delete the second SCR after testing.

Tuesday, June 15, 2010

Looping Through CSV Files using SSIS

Looping Through CSV Files
This program will loop through a folder and its sub folders, find all the *.CSV files and retrieve data from each file and inserts the record to a destination table and finally moving the file to a backup folder after each loop cycle.

1. The file format must be in a *.CSV format (Flat File)
2. A sample source file of the CSV file (Please renamed to SampleCSVFile.csv)
3. A Folder that will be the main folder of the ETL (in this case it will be named => C:\SSIS\LoopingThroughCSVFiles)
4. Two other sub folders 1- “Sample” 2- “ToBeProcessed”
5. Copy the CSV Sample File in the “Sample Folder”
6. All headers must be on the first row of each CSV file
7. No empty/missed rows are accepted, remove them all

Over all View

Defining some variables in the package
One of the most important part of a SSIS package are the variables. Because some variables will be defined in the Configuration File(s), so that you can use the package dynamically,
Example 1: The server name on your computer is different that the server name on production server so basically you can have the “server name variable” defined in the XML or table configuration so that it can be changes when deploying the package from one server to another.

1. “uVar_BackupActiveFolder” Each time the package runs it makes a folder in the backup folder this folder will be known as the current backup folder which will be used to save the CSV backup files and/or maybe the Error text file.
2. “uVar_BackupMainFolderPath” Some customers prefer to have the backup folder in another server so basically we allow them to have a choice, by adding this variable (See the script section for more details). If this variable is empty it will make a default backup folder name (See backup scripts).
3. “uVar_DestinationDBName” The name of the Destination Database.
4. “uVar_DestinationServerName” The name of the Destination SQL Server.
5. “uVar_DestinationTableName” The name of the Destination Table in SQL Server.
6. “uVar_ErrorOutPutFile” This is the path and file name of the Error file, each time this path and file name changes so that we can keep a backup of each error log file.
7. “uVar_FolderNameOfFilesToBeProcessed
8. “uVar_MainFolderPath” This is the path where you will be having your main ETL; it can contain the Package, configuration files, backup, sample folder, ToBeProcessed folder and etc... (C:\SSIS\LoopingThroughCSVFiles)
9. “uVar_NumberOfExtractedRows” A row counter that won’t be used in this example.
10. “uVar_SMTPEmailServerName” The SMTP name or IP address of it
(e.g. 1: Neptune e.g. 2: e.g. 3:
11. “uVar_SMTPSendEmailsTo_FromError” A List of email address to email the error file. ( ; YourEmail@YourBox.Com )
12. “uVar_SourceConStr_DataSource” A variable that will have the file name and folder (See the loop property).
13. “uVar_SourceFileExtention” This variable will define the file extension (e.g. *.csv).
14. “uVar_SourceFileNameRenamed” Before moving a file to the backup folder we will have to rename and set the right backup folder in this variable.

Defining Connection Manager
We need 4 connections, in this example I will not be setting the error and email section you can check my blog for that
1. CSV Flat File Connection Manager
The setting in this section depend on the format of the source CSV file, you may have a semicolon (;) as a row delimiter or a tab as a column delimiter, every format has its own settings. In my case the settings are...

2. FF_DST----Error Output file for debugging

3. SQLServerDestinationConnection

4. SMTP Connection Manager For Error messages

Adding Object (controls) in the Control Flow

Now we can add the objects as seen in the “Over all View
1. SCR---- Making the "ToBeProcessed Folder"

In this section the customer can redirect the source folder by entering the folder full path name in the “uVar_FolderNameOfFilesToBeProcessed” variable, if not the system will pick up the default name “ToBeProcessed”.

2. SCR---- Making the backup folder string from BackupFolderName Variable (Dynamic Backup folder yyyy-mm-dd-hh-mm-ss)

Same settings again the customer can define the backup folder in the variable “uVar_BackupMainFolderPath”, if not the Package will pick the default folder name “BackUpFolder”. The other thing that happens in this script is that the current backup folder name string is been made, e.g. C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

3. FSYS----Creating the backup folder from BackupFolderName

This will make the backup folder from the “uVar_BackupActiveFolder” which will be C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

4. SCR--- Make the correct path and file name for the Error Text File

What I am doing is making a file name string that has, the date and time, ETL Name, etc... This will be needed it an error accurse

5. SQL----Truncate the destination Table

6. FLC - Looping through each CSV file - Finding the Data Source

By using the variable “uVar_FolderNameOfFilesToBeProcessed” the loop will be loop through the folder and sub folder to find *.csv files “name and folder” and each file folder and name will be saved in the “uVar_SourceConStr_DataSource” e.g. C:\SSIS\LoopingThroughCSVFiles\ToBeProcessed\AnyCSVFileName.csv

7. DFT---- MAIN ETL (E and T Only)

8. FF_SRC---- Connecting to CSV file to extract Data

Setting the right connection object for the Flat File data source object

9. OLE_DST----SQL SERVER Destination table

Setting the destination table in SQL server for the OLE destination object.

10. SCR--- Rename file names with the format of "yyyy-mm-dd-hhmmssff"

Making a string that will rename the current file that has been processed and save it in the current backup folder

11. FSYS---- Moving processed files to the "Backup Active Folder"

Moving the processed file to that final backup folder

12. End of Loop

Testing Scenarios

Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.

Scenario 1:
1- Copy CSV files to the “ToBeProcessed” folder
2- Run the package
3- Check the destination table
4- Check the backup folder and find the current date time folder and see if the file have been moved from the “ToBeProcessed” folder to the new current backup folder