Friday, June 18, 2010

Chronological Order (List files by oldest first)

Chronological Order (List files by oldest first)
Extracting (listing) the files in a chronological order by, (this example “Modified Date”).

Criteria
--
Over all View


Defining some variables in the package


1. “uVar_ArrayOfFilesProperties” An Object that will contain each files information.
2. “uVar_CurrentFileName” Name of the current file
3. “uVar_DateCreated” File creation date
4. “uVar_DateMofified” File creation modified
5. “uVar_FileExtention” i.e. “xls” for excel, “txt” for text.
6. “uVar_FileName” The file name only without the file extension.
7. “uVar_FirstTime” A flag for the flow of the package
8. “uVar_FolderNameOfTheFilesToBeProcessed” The main folder root of the files that need to be in a chronological order.
9. “uVar_Path” The path of each file
10. “uVar_PathFileName” The path, filename and extension all together
11. “uVar_SourceFileExtention” This variable will define the file extension (e.g. *.csv).

Adding Object (controls) in the Control Flow
Adding the objects as seen in the “Over all View”

1. FELC--Looping through files and folder
The first loop will find the file and process them one by one


2. SCR--Getting each files property information
The script will retrieve each files property; it’s like right clicking on a file and selecting properties.



The files properties information will be saved in the variables (See the script or check the variables with a red dot in the variable table).

3. DFT--List of sorted files by ModifiedDate
Over view of the DFT


In the DFT you will have
a. SCR--Appending the records in SSIS object variable with the current file info for the output records

You will have to use the ADD COLUMN option and add each Fields; you will have to select the right data type for each field



(Example for data type: DateModified = [DT_DBTIMESTAMP] , Path = [DT_WSTR] 3000, FileName = = [DT_WSTR] 200 , etc...)


All variable that contain the files information must be passed on to the Script component and the “User::uVar_ArrayOfFileProperties” variable.
The files properties information will be saved in the variables (See the script or check the variables with a red dot in the variable table).
• Scrip
The script is divided into 2 section, the first section is before the “IF” and the second section is within the “IF”, the first section is adding the current files information into the Output Buffer of the script component, the second part is looping through all the records in the SSIS object “uVarArrayOfFileProperties” and adding them to the Output Buffer of the script component, so the output result will always be the current files and the previous files information.
---------------
Public Overrides Sub CreateNewOutputRows()
'
'
With MainOutputrecordsBuffer
.AddRow()
.PathFileName = Me.Variables.uVarPathFileName
.Path = Me.Variables.uVarPath
.FileName = Me.Variables.uVarFileName
.FileExtension = Me.Variables.uVarFileExtension
.DateCreated = Me.Variables.uVarDateCreated
.DateModified = Me.Variables.uVarDateModified
End With

If Me.Variables.uVarFirstTime = False Then

Dim oleDA As New Data.OleDb.OleDbDataAdapter
Dim dt As New Data.DataTable

Dim row As System.Data.DataRow

Dim strPathFileName As Integer
Dim strPath As Integer
Dim strFileName As Integer
Dim strFileExtension As Integer
Dim strDateCreated As Integer
Dim strDateModified As Integer

oleDA.Fill(dt, Me.Variables.uVarArrayOfFileProperties)

strPathFileName = dt.Columns("PathFileName").Ordinal
strPath = dt.Columns("Path").Ordinal
strFileName = dt.Columns("FileName").Ordinal
strFileExtension = dt.Columns("FileExtension").Ordinal
strDateCreated = dt.Columns("DateCreated").Ordinal
strDateModified = dt.Columns("DateModified").Ordinal

For Each row In dt.Rows
With MainOutputrecordsBuffer
.AddRow()
.PathFileName = row(strPathFileName).ToString()
.Path = row(strPath).ToString()
.FileName = row(strFileName).ToString()
.FileExtension = row(strFileExtension).ToString()
.DateCreated = CType(row(strDateCreated).ToString(), DateTime)
.DateModified = CType(row(strDateModified).ToString(), DateTime)
End With
Next
MainOutputrecordsBuffer.EndOfRowset()

End If
End Sub

----------------

b. SRT--Sorting the list by "DateModified"
This is almost where you have to choose your chronological order , I have selected the modified date, why?


Do you think that you can find a files property that says the Create date is bigger that the modified date (Create = 2010-01-01 and Modified = 2007-01-01)? Yes you can, a file can be made on a computer on 2005, and be modified on the same computer on 2007, so far so good, and the same file can be pushed to a another computer on 2010-01-01 so on the second computer the files properties show that Create Date = 2010-01-01 and Modified Date= 2007-01-01.
c. RS_DST--Saving records in the SSIS objects
Now we have to save the sorted result to a SSIS Object variable.


Please remember to select all fields.


4. SCR--Setting a flag
Passing on the flag variable to set the flag to FALSE, the reason we do this is that, if you look at the script in the DFT you will see that the flag is been used because the SSIS Object does not have a structure format for the first time so we won’t be able to use the object (See in script “oleDA.Fill(dt, Me.Variables.uVarArrayOfFileProperties)”).




5. FELC--Looping the sorted file list
Setting the second file


This will loop through each record in the object one by one and save the value in the mentioned variables.


6. SCR-- Show the result in a MsgBox (For test only)
In the second loop I am sure that you will be having a DFT to do the main process in my case I just want to display the sorted order.


Note: Step 6 is only for testing only, please delete the object after testing.


Testing Scenarios
Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.

Scenario 1:
1- Select a folder that contains the files that you need to be sorted
2- Cut and past the folder into the SSIS variable “uVar_FolderNameOfFilesToBeProcessed”, this will point to that folder
3- Maybe you want to select a certain group of files like *.csv then you need to put “*.csv” in the variable “uVar_SourceFileExtention”
4- Run the package and you will see a MsgBox showing the files properties, check the files Modified date with the results of the MsgBox

Note: Step 6 is only for testing please delete the second SCR after testing.

Tuesday, June 15, 2010

Looping Through CSV Files using SSIS

Looping Through CSV Files
This program will loop through a folder and its sub folders, find all the *.CSV files and retrieve data from each file and inserts the record to a destination table and finally moving the file to a backup folder after each loop cycle.

Criteria
1. The file format must be in a *.CSV format (Flat File)
2. A sample source file of the CSV file (Please renamed to SampleCSVFile.csv)
3. A Folder that will be the main folder of the ETL (in this case it will be named => C:\SSIS\LoopingThroughCSVFiles)
4. Two other sub folders 1- “Sample” 2- “ToBeProcessed”
5. Copy the CSV Sample File in the “Sample Folder”
6. All headers must be on the first row of each CSV file
7. No empty/missed rows are accepted, remove them all

Over all View



Defining some variables in the package
One of the most important part of a SSIS package are the variables. Because some variables will be defined in the Configuration File(s), so that you can use the package dynamically,
Example 1: The server name on your computer is different that the server name on production server so basically you can have the “server name variable” defined in the XML or table configuration so that it can be changes when deploying the package from one server to another.



1. “uVar_BackupActiveFolder” Each time the package runs it makes a folder in the backup folder this folder will be known as the current backup folder which will be used to save the CSV backup files and/or maybe the Error text file.
2. “uVar_BackupMainFolderPath” Some customers prefer to have the backup folder in another server so basically we allow them to have a choice, by adding this variable (See the script section for more details). If this variable is empty it will make a default backup folder name (See backup scripts).
3. “uVar_DestinationDBName” The name of the Destination Database.
4. “uVar_DestinationServerName” The name of the Destination SQL Server.
5. “uVar_DestinationTableName” The name of the Destination Table in SQL Server.
6. “uVar_ErrorOutPutFile” This is the path and file name of the Error file, each time this path and file name changes so that we can keep a backup of each error log file.
7. “uVar_FolderNameOfFilesToBeProcessed
8. “uVar_MainFolderPath” This is the path where you will be having your main ETL; it can contain the Package, configuration files, backup, sample folder, ToBeProcessed folder and etc... (C:\SSIS\LoopingThroughCSVFiles)
9. “uVar_NumberOfExtractedRows” A row counter that won’t be used in this example.
10. “uVar_SMTPEmailServerName” The SMTP name or IP address of it
(e.g. 1: Neptune e.g. 2: mars.com e.g. 3: 10.2.10.56)
11. “uVar_SMTPSendEmailsTo_FromError” A List of email address to email the error file. (Snikkhah@Live.ca ; YourEmail@YourBox.Com )
12. “uVar_SourceConStr_DataSource” A variable that will have the file name and folder (See the loop property).
13. “uVar_SourceFileExtention” This variable will define the file extension (e.g. *.csv).
14. “uVar_SourceFileNameRenamed” Before moving a file to the backup folder we will have to rename and set the right backup folder in this variable.

Defining Connection Manager
We need 4 connections, in this example I will not be setting the error and email section you can check my blog for that
1. CSV Flat File Connection Manager
The setting in this section depend on the format of the source CSV file, you may have a semicolon (;) as a row delimiter or a tab as a column delimiter, every format has its own settings. In my case the settings are...





2. FF_DST----Error Output file for debugging

3. SQLServerDestinationConnection


4. SMTP Connection Manager For Error messages

Adding Object (controls) in the Control Flow

Now we can add the objects as seen in the “Over all View
1. SCR---- Making the "ToBeProcessed Folder"


In this section the customer can redirect the source folder by entering the folder full path name in the “uVar_FolderNameOfFilesToBeProcessed” variable, if not the system will pick up the default name “ToBeProcessed”.

2. SCR---- Making the backup folder string from BackupFolderName Variable (Dynamic Backup folder yyyy-mm-dd-hh-mm-ss)




Same settings again the customer can define the backup folder in the variable “uVar_BackupMainFolderPath”, if not the Package will pick the default folder name “BackUpFolder”. The other thing that happens in this script is that the current backup folder name string is been made, e.g. C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

3. FSYS----Creating the backup folder from BackupFolderName


This will make the backup folder from the “uVar_BackupActiveFolder” which will be C:\SSIS\LoopingThroughCSVFiles\BackUp\2010-04\2010-04-10-141000.

4. SCR--- Make the correct path and file name for the Error Text File




What I am doing is making a file name string that has, the date and time, ETL Name, etc... This will be needed it an error accurse

5. SQL----Truncate the destination Table


6. FLC - Looping through each CSV file - Finding the Data Source


By using the variable “uVar_FolderNameOfFilesToBeProcessed” the loop will be loop through the folder and sub folder to find *.csv files “name and folder” and each file folder and name will be saved in the “uVar_SourceConStr_DataSource” e.g. C:\SSIS\LoopingThroughCSVFiles\ToBeProcessed\AnyCSVFileName.csv

7. DFT---- MAIN ETL (E and T Only)


8. FF_SRC---- Connecting to CSV file to extract Data

Setting the right connection object for the Flat File data source object


9. OLE_DST----SQL SERVER Destination table

Setting the destination table in SQL server for the OLE destination object.


10. SCR--- Rename file names with the format of "yyyy-mm-dd-hhmmssff"


Making a string that will rename the current file that has been processed and save it in the current backup folder


11. FSYS---- Moving processed files to the "Backup Active Folder"

Moving the processed file to that final backup folder

12. End of Loop

Testing Scenarios

Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.

Scenario 1:
1- Copy CSV files to the “ToBeProcessed” folder
2- Run the package
3- Check the destination table
4- Check the backup folder and find the current date time folder and see if the file have been moved from the “ToBeProcessed” folder to the new current backup folder




Wednesday, April 28, 2010

Emailing work log notification to individual users using SSIS

Emailing work log notification to individual users by using SSIS
Another project that I had worked on was to send emails with attached excel work log files to individual users; I used the same package with some modification to send excel Management MIS reports to CEOs & managers, the Excel file contained pie charts and etc...

Criteria
1. A sample Excel (97 – 2003) file with one sheet named “Report”
2. In the “Report” sheet the field names that need to be reported to the user must be on the first row, e.g.


3. The list of the fields will be used in the SSIS package; in this post I will be only using one field (UserID).
4. The Excel file name must be renamed to “SampleExcelFile.xls”
5. Make a main root folder e.g. “C:\SSIS\Email to list with attached Excel File”
6. Make two other sub folders 1- “Sample” 2- “ToBeEmailed”
7. Copy the Excel Sample File in the “Sample” folder


Note(s)
1. The attached file format will be in *.xls format (Excel (97 – 2003))

Over all View



Defining some variables in the package
One of the most important part of a SSIS package are the variables, because some variables will be defined in the SSIS Configuration File(s) and/or Table(s), so that the package can be used dynamically.
Example: The server name on your computer is different that the server name on production server so basically you can have the “server name variable” defined in the SSIS configuration XML file or table so that it can be changes when deploying the package from one server to another.



1. “uVar_BackupActiveFolder” Each time the package runs it makes a backup folder known as the current backup folder which will be used to save the Excel backup files and/or maybe the Error text file.
2. “uVar_BackupMainFolderPath” Some customers prefer to have the backup folder in another server so basically we allow them to redirect the folders location by using this variable. If this variable is empty it will make a default backup folder name (See backup scripts).
3. “uVar_BackupUserFolder” once the variable “uVar_BackupActiveFolder” is defined, the package will make subfolders with the name of the users, so that it will be easier to trace what was reported on that day and for whom.
4. “uVar_DestinationFileNameRenamed” To move the excel file to the backup folder we need to rename it.
5. “uVar_ListOfUsers” A SSIS variable to save the list of users that needs to be notified.
6. “uVar_MainFolderPath” This is the path where you will be having your main ETL folder root; it can contain the Package, configuration files, backup, sample folder, ToBeProcessed folder and etc... (C:\SSIS\ETLName)
7. “uVar_Source_SQLQueryString” A SQL statement that provides the work log of each user, The TOP statement is been used for design time only, the script string will change in the package loop (check object=> “SCR--- Make the WHERE condition of the Source table”).
8. “uVar_SourceSampleExcelFile” the location of the Excel sample file.
9. “uVar_ToBeEmailedFile” The file that will be emailed individually
10. “uVar_ToBeEmailedFolder” The folder that has the file that needs to be emailed.
11. “uVar_UserEmail” The email of the current user in the Loop.
12. “uVar_UserID” The current user ID in the Loop.
13. “uVar_UserName” The current user name in the Loop.

Defining Connection Manager
I am going to define only one connection in this example I am sure you can set the rest of them yourself.

1. Excel connection manager



Set the “DelayValidation” to TRUE and enter the right the connection string e.g.
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\SSIS\Email to list with attached Excel File\Sample\SampleExcelFile.xls;Extended Properties="EXCEL 8.0;HDR=YES";

2. SMTP Connection Manager

3. SQLServerSourceConnection

You can add the Error handling features in the package.

Adding Object (controls) in the Control Flow
Now we can add the objects as seen in the “Over all View
1. SCR---- Making the backup folder string from BackupFolderName Variable (Dynamic Backup folder yyyy-mm-dd-hh-mm-ss)


Redirect the backup folder in the “uVar_BackupMainFolderPath” variable; if the variable is empty the system will pick up the default folder name “BackUpFolder” in the main folder path of “uVar_MainFolderPath”. E.g.
C:\SSIS\Email ..................Excel File\BackUpFolder\2010-04\2010-04-27-103108.



2. FSYS----Creating the backup folder from BackupFolderName
Making the backup active folder from the “uVar_BackupActiveFolder” variable. C:\SSIS\Email ..................Excel File\BackUpFolder\2010-04\2010-04-27-103108.



3. FSYS----Creating the ToBeEmailed Folder
Creating the folder. C:\SSIS\Email ..................Excel File\ ToBeEmailed.



4. SCR--- Make ToBeEmailedFile String "Folder and File Name"
A “variable pointer” that points to the sample excel file.





5. DFT --- Find the list of users to Email to
We now need to get the list of users that need to be notified about their daily work list report, mainly we need the fields of the UserID, UserName and email, the number of fields and the order of the fields are very important (See: Foreach Loop Container).
The DFT looks like...



Retrieving the list of users from the OLE DB Source object.



Mapping fields.



Saving the list of users in a SSIS variable object, the loop in the package will be looping through this variable user by user



6. Foreach Loop Container
What the Loop section dose, is to loop though the object variable “uVar_ListOfUsers” (that has the list of users that needs to be notified) user by user, make an Excel file for each user, make a backup folders and move the excel file to the right backup folder for each user and finally email to the user with the right excel file attached.
Loop settings...



Now we can see why the order and the number of fields in the SELECT statement are critical because of the fields mapping in the loop.



7. FSYS -- Copy a Template Empty Excel file to the ToBeEmailed Folder
In each loop the package (for each user in the notification list) copies a sample of the Excel sample file (SampleExcelFile.xls) to the “uVar_ToBeEmailedFolder”, the file must have a sheet named “Report”.



8. SCR--- Make the WHERE condition of the Source table



Now we have to make a query string that has the current user work log only, in other words the excel file that will be emailed to each user must have his/her work list report only, so we must filter the report table user by user by using the UserID.

Note: In this select statement you must have all the field names that will be in the excel file as header fields, in my sample I will be only using one field (UserID)
The string will be saved in the “uVar_Source_SQLQueryString” variable that will be used by the “OLE DB Source” that is in the “DFT --- Main Task”.



9. DFT --- Main Task
Saving the work log list for the current user into the excel file.



The OLE Source is pointing to the WorkList table that is been filtered by the current UserID.
In the variable “uVar_Source_SQLQueryString” I had defined a query like “SELECT TOP 1 UserID .....” and I am sure that raised few questions like, why using TOP? Why hard coding a UserID? The answers are in the next step, while developing a package you need to define the fields and you need to be able to map them in the OLE DB source editor, and that’s why.



Saving the work list to the excel file in the “Report” sheet.



10. SCR--- Make Folderstring Like "Backup Folder and UserID"



C:\SSIS\Email .........File \BackUpFolder\2010-04\2010-04-27-103108\FName1 LName1



11. FSYS---- Make Backup folder in the name of the User
C:\SSIS\Email .........File \BackUpFolder\2010-04\2010-04-27-103108\FName1 LName1



12. SCR--- Rename file names with the format of "yyyy-mm-dd-hhmmssff"



The string will look like
C:\SSIS\Email ...... File\BackUpFolder\2010-04\2010-04-27-103108\FName1 LName1\FName1 LName1 2010-04-27-10311137.xls



13. FSYS---- Moving processed files to the "Backup Active Folder"



Moving the file from ....
C:\SSIS\Email ..........File\ToBeEmailed\ SampleExcelFile.xls
To .....
C:\SSIS\Email ...... File\BackUpFolder\2010-04\2010-04-27-103108\FName1 LName1\FName1 LName1 2010-04-27-10311137.xls

14. Send Mail Task


Testing Scenarios
Before testing make sure that you have set the variables according to the snap shot of the variables that I have and check the “Criteria”.
Once you run the package you should be able to see.