In this example, I loop through a collection of Excel files in a directory and load the data from each Excel file into one TARGIT InMemory table.
I use two data drivers for this example. FileList to create a table with the Excel files and then the Excel data driver to load each Excel file.
The Excel data driver is linked to a single Excel file. Therefore I have to create and close the Excel driver for each file in the loop.
The directory with the source Excel files is defines in the variable "filepath" in line 7, so you can change the path and run the script.
LOAD ASSEMBLY 'TARGIT.FileList.dll' //server
LOAD ASSEMBLY 'TARGIT.Excel.dll' //server
DECLARE @filepath AS STRING
DECLARE @fullpath AS STRING
SET @filepath = 'C:\Transactions\Excel'
DATASOURCE [Filelist] = DOTNET CONNECTION 'TARGIT.FileList.FileListConnection' 'loadasstring=false;'
IMPORT [Excelfiles] = [Filelist].{SELECT * FROM [@@filepath] WHERE filename='*.xlsx' AND directory='current'}
WHILE Excelfiles.EOF=FALSE /* Loop over all Excel files */
SET @fullpath = Excelfiles.fullpath
PRINT 'Loading the file ' + @fullpath
DATASOURCE [Excelfile] = DOTNET CONNECTION 'TARGIT.Excel.ExcelConnection' 'ignoreemptyvalues=false;datasource=' + @fullpath + ';detectionrowscount=100;skiplines=0;hasheaders=True;loadasstring=true'
IMPORT [STAGE_Excel_Person] = Excelfile.{SELECT * FROM []}
CLOSE Excelfile
MoveNext Excelfiles
LOOP
// Remove list of Excelfiles from final database
Drop Excelfiles
SAVE
Alternative script - Adding filename to the data:
In this example, I add an extra step "STAGE_Excel_Person_001" where I load the data from Excel.
In the loop, I've added another variable "filename" using TARGIT.Excel.filename property.
Then I combine the data from "STAGE_Excel_Person_001" table with the variable "filename" - and drops the "STAGE_Excel_Person_001" table.
LOAD ASSEMBLY 'TARGIT.FileList.dll' //server
LOAD ASSEMBLY 'TARGIT.Excel.dll' //server
DECLARE @filepath AS STRING
DECLARE @filename AS STRING
DECLARE @fullpath AS STRING
SET @filepath = 'C:\Transactions\Excel'
DATASOURCE [Filelist] = DOTNET CONNECTION 'TARGIT.FileList.FileListConnection' 'loadasstring=false;'
IMPORT [Excelfiles] = [Filelist].{SELECT * FROM [@@filepath] WHERE filename='*.xlsx' AND directory='current'}
WHILE Excelfiles.EOF=FALSE /* Loop over all Excel files */
SET @fullpath = Excelfiles.fullpath
SET @filename = Excelfiles.filename
PRINT 'Loading the file ' + @filename
DATASOURCE [Excelfile] = DOTNET CONNECTION 'TARGIT.Excel.ExcelConnection' 'ignoreemptyvalues=false;datasource=' + @fullpath + ';detectionrowscount=100;skiplines=0;hasheaders=True;loadasstring=true'
IMPORT [STAGE_Excel_Person_001] = Excelfile.{
SELECT * FROM []
}
CLOSE Excelfile
IMPORT [STAGE_Excel_Person] = ME.{
SELECT '@@filename' AS [SourceFilename], * FROM STAGE_Excel_Person_001
}
DROP [STAGE_Excel_Person_001]
MoveNext Excelfiles
LOOP
// Remove list of Excelfiles from final database
Drop Excelfiles
SAVE
Please note:
Each Excel file is expected to have the same data structure. You could consider to add try catch to the loop to avoid the script from stopping, if a file has a different data structure.
Data is loaded as strings, so we expect you to cast data with SAFECAST afterwards, but that is not the focus in this article.
I've added "SAVE" in the end of the script, so it can run on it's own. But you might want to leave SAVE out, if you copy this script into an existing ETL flow.
Demo data:
Below this article you will find "excel_example_data.zip" containing two small Excel files (file1.xlsx and file2.xlsx) that you can use for testing.
Comments
Nice post Jesper.
Just created a new In Memory Project, pasted in the script and changed the "filepath".
Runs out-of-the-box and my 21 files are imported into one table. Thanks :-)
Jesper da Silva Endelt - If u can show how to loop through a collection of Json-files, that would be much appreciated :)
Hi Jesper
is there any way you can get the filename in the table as well. Very handy when you need to figure out where a line is originating from. Also if you have several spretsheets with same data but different month ie.
Hi Claus,
Great question.
I tested adding a variable directly to the Excel load script, but the Excel driver does not support this.
So I added another step to the original script using a temp table ([STAGE_Excel_Person_001]). I've also adjusted the variables, now using the Excel.filename property.
The script is also added to the main article above, but here you have the adjusted script:
LOAD ASSEMBLY 'TARGIT.FileList.dll' //server
LOAD ASSEMBLY 'TARGIT.Excel.dll' //server
DECLARE @filepath AS STRING
DECLARE @filename AS STRING
DECLARE @fullpath AS STRING
SET @filepath = 'C:\Transactions\Excel'
DATASOURCE [Filelist] = DOTNET CONNECTION 'TARGIT.FileList.FileListConnection' 'loadasstring=false;'
IMPORT [Excelfiles] = [Filelist].{SELECT * FROM [@@filepath] WHERE filename='*.xlsx' AND directory='current'}
WHILE Excelfiles.EOF=FALSE /* Loop over all Excel files */
SET @fullpath = Excelfiles.fullpath
SET @filename = Excelfiles.filename
PRINT 'Loading the file ' + @filename
DATASOURCE [Excelfile] = DOTNET CONNECTION 'TARGIT.Excel.ExcelConnection' 'ignoreemptyvalues=false;datasource=' + @fullpath + ';detectionrowscount=100;skiplines=0;hasheaders=True;loadasstring=true'
IMPORT [STAGE_Excel_Person_001] = Excelfile.{
SELECT * FROM []
}
CLOSE Excelfile
IMPORT [STAGE_Excel_Person] = ME.{
SELECT '@@filename' AS [SourceFilename], * FROM STAGE_Excel_Person_001
}
DROP [STAGE_Excel_Person_001]
MoveNext Excelfiles
LOOP
// Remove list of Excelfiles from final database
Drop Excelfiles
SAVE
Hi Jesper
Thanks for the answer, it works perfect :-)
Regards Claus
Great. Thank you for the update.
Please sign in to leave a comment.