Using TARGIT ETL Studio, Python and ChatGPT to learn about your TARGIT Installation

This example was motivated by a number of factors that came together and ended up in a small example of what you can do, which hopefully can inspire you out there.

Actually, I started out investigating if there was an easy way to traverse through TARGIT folders and extract some information about the analysis and reports available there.

  • Initially, I wanted to extract information about triggers found in TARGIT analysis, but my focus switched to version numbers, use of user dimensions, and connections used in the analysis.
  •  I soon found out that I wanted to use Python for that purpose, and spent a little time learning the basics of Python and putting a script together.

However, it became very time-consuming to manually create and alter the script every time I wanted to improve or change it.

  • That led me to check if ChatGPT was as good as rumor has it, in creating code ready for execution.

So, my desire to "hack" the TARGIT files led me to Python, which led me to ChatGPT.

The whole journey didn't take that long, by the way...

 

The ChatGPT part started out with a simple question/order from me to ChatGPT:

create a python script that will traverse through a filesystem with the base path C:\Programdata\TARGIT\ANTServer\VFS and examine all files for lines containing the word trigger. The python script must create a text file that hold all these lines. The text file should be called trigger_collection.txt.  

 

ChatGPT provided a script in a few seconds that I copied and tested right away.

The script worked immediately when I executed it, and by the way, I'm skipping the part about how to get a Python environment up and running. It's pretty easy and very well documented.

 

After a few iterations, where I added more and more text to my order - I ended up giving ChatGPT this (changed) order:

make a python program that traverses thorugh a file system with the base path c:\programdata\targit\ANTServer\VFS\Global and creates a CSV file with the following information: Filename (including path), Filename (excluding path and extension) as the 2 first columns. The third column in the csv file should have the Header Version and be based on a text search within each document for this string: <version suite=" when this string is met the python script must pick up the subsequent characters until in encounter another " - this is a text field by the way. The fourth column in the csv file should have the header Connection and be based on a text search within each document for the first occurence of this string: <metadata database=" when this string is met the python script must pick up the subsequent characters until in encounter another " The fifth column in the csv file should have the header Brugerdimension and be based on a text search within each document for the first occurence of this string: <metadata userdimension when this string is met the python script must write JA in this column, otherwise NEJ The seventh column should have the header Type and contain the extension of the filename only

 

Again, ChatGPT provided me with Python code in seconds, and when I executed the script, it gave me a nice CSV file - see below for a screenshot of the top of this file:


It was very easy to get this far - now I just needed a way to schedule my Python script from within TARGIT and utilize the data within TARGIT.

I'm aware that Data Discovery holds two different Python plugins that could probably be utilized. Still, I wanted to use the TARGIT InMemory platform since it's better suited for more complex data models. I'd like to keep my options open about further developing this procedure.

So, I created a small InMemory script using ETL Studio. 

It consists of a script task that executes the Python script and an import task that imports the resulting CSV file.

My whole control flow in ETL Studio looks like this:

 

So now I'm able to examine my TARGIT file structure and determine the following:

  1. In which the TARGIT version of my various documents has been saved,
  2. What the primary data source connection of the analysis is, and finally
  3. whether the document holds any shared user dimensions.

These are just meant as examples of what you can do with this method, but these 3 bits of information can be pretty helpful to those trying to get an overview of their TARGIT installation.

I've attached the Python Script (traverse.py) and the InMemory project (call of python script and import of resulting CSV file.improj) for those interested.

Was this article helpful?
1 out of 1 found this helpful

Comments

2 comments
  • I've attached another sample python script (logins.py) - also generated with ChatGPT by the way - that will convert an XML file to JSON, which can then be imported into InMemory (or Data Discovery).

    I've tried a couple of use cases - one is logins.xml which is an XML file in TARGIT that contains information on all users, who at some point has logged into TARGIT.

    Another that I have tried is roles.xml  - which holds information on role setup in TARGIT Management.

    The idea was again to "harvest" some information and bring it into TARGIT where you can use all the known features to create an easy overview of the content.

    Note: If you use the outcome of the script through json - you need to set the depth in the properties of the connector to json.

    Attached: python script (logins.py) and inmemory project (call of python script to convert xml to json and then import to inmemory.improj) to test yourself.

    0
  • I just added a little more to the initial python script (traverse.py). It adds an extra column to the csv file which holds information on who edited the document last time. (This information is currently not a part of TARGIT Insights).

    The (improved python script is attached under the name (fileinformation.py)

    0

Please sign in to leave a comment.