What is an ETL and why PLM should care?
Don't start a PLM project without knowing what an ETL is
Filter by Category
Filter by Author
Worked for 10 years in PLM as a user, integrator and software editor, I like to have a very wide approach of PLM. I share my thoughts about various functionnal bricks of PLM but also about the technological stack supporting PLM.
Don't start a PLM project without knowing what an ETL is
Posted by Yoann Maingon
Data is the essence of most applications and this is particularly true for PLM. How you store the data is a key aspect of your PLM application. It will define how much data you...
Posted by Yoann Maingon
Looking at the picture you can tell that I am in the NYC area to attend GraphConnect 2018. GraphConnect is a major conference organized by Neo4J, the graph database. During a bit...
Posted by Yoann Maingon
The lock mechanism is a key element in all collaboration software. It’s the little button that allows to make sure you are the only one editing an item. usually locking an...
Posted by Yoann Maingon
I’m always surprised to see how low the PLM adoption is. I have been in this industry for almost 10 years and I’m still entering meetings when most people are not...
Posted by Yoann Maingon
Welcome to this new blog about PLM. This is not my first PLM blog and this is definitely not the first blog about PLM. So why do I think it adds value to launch this blog? I think...
Posted by Yoann Maingon
Don't start a PLM project without knowing what an ETL is
Your PLM project will not install a new isolated island. If you do so, then you haven’t understood the whole digitalisation process and digital thread concept applying not only to PLM but to your whole organisation. Therefor, you need to understand up-front how your system will communicate with the rest of the company’s tools. You will also have to define how you interact with the outside world. The ETL is part of an eco-system of tool that helps you for this.
Don’t start a PLM project without knowing what an ETL is !!
This is I believe the sample TLA I have seen so far. It says exactly what it does: Extract, Transform and Load data.
The main strength of an ETL on the Extract phase is to allow you to retrieve data from as many sources as possible. The sources can be diverse from application that are exposing an API to a simple text file stored in a folder.
Types of system you may query :
The goal of ETLs will be to have as many connectors possible. Talend and its open source model allowed to let the community build a lot of integrations.
Transform is where it becomes much more tricky. Transformation requires a lot of different capabilities like mapping fields, converting flow into arrays of data or into objects, filtering data, joining tables aggregating data,etc. When you are done with all the available tools your ETL provide, most of them allow to add some custom code to make sure you are not limited.
Finally the LOAD process has the same technical goal of the EXTRACT process: load the prepared data to as many target systems as possible.
The #1 scenario for ETL is migration. I have managed a few migration perfectly with an ETL. Usually the graphical UI and the versioning of your ETL setup will make it possible to explain how the migration flow works without getting too technical.
I have used ETLs several times to make sure legacy systems could still be integrated to the new solution we were provided. This is usually where we work the most with extracting/inserting data in databases or even playing with files. ETLs often have this cool feature which allow to look for any change in a folder. So whenever a new file appear it can trigger an ETL flow.
The long term use-case for an ETL is the connection with a larger enterprise system like an ESB (Enterprise Service Bus) which I will describe in a future blog post. The goal is to have a central system which will manage the different data sources and connect triggers and data on a single bus. The connection between this bus and any other system would be handled with an ETL allowing to standardize as much as possible the data on the Bus.
The one risk with ETL is to start creating too many one-to-one connections. It becomes complicated to maintain at some point. Depending on the context it might suit you very well because you need to keep these integrations independent in their evolution. But the bigger the system becomes the more you will need to look for a better organized system using an ESB.
I have found everything I wanted using Talend. Haven’t tried others except clover ETL a few years back.
Here is a great video introducing to ETL
Last June (June 17th 2021), Neo4j raised $325 millions. Last week ( october 5th 2021), Memgraph raised $9.34 millions. Tigergraph raised $105 millions last winter (February 17th...
I’m a little bit late on the marketing trend to write a post about low-code. But I was recently asked how “low-code” was Ganister PLM? I realized that you could...