What is an ETL and why PLM should care?

What is an ETL and why PLM should care?

Don't start a PLM project without knowing what an ETL is


Yoann Maingon
Yoann Maingon
@yoannmaingon
What is an ETL and why PLM should care?

Your PLM project will not install a new isolated island. If you do so, then you haven’t understood the whole digitalisation process and digital thread concept applying not only to PLM but to your whole organisation. Therefor, you need to understand up-front how your system will communicate with the rest of the company’s tools. You will also have to define how you interact with the outside world. The ETL is part of an eco-system of tool that helps you for this.

Don’t start a PLM project without knowing what an ETL is !!

ETL stands for Extract Transform Load

This is I believe the sample TLA I have seen so far. It says exactly what it does: Extract, Transform and Load data.

Extract

The main strength of an ETL on the Extract phase is to allow you to retrieve data from as many sources as possible. The sources can be diverse from application that are exposing an API to a simple text file stored in a folder.

Types of system you may query :

  • Business Application
Talend native list of application connectors
  • Web services
  • Databases
  • Files
Talend input filetypes

The goal of ETLs will be to have as many connectors possible. Talend and its open source model allowed to let the community build a lot of integrations.

Transform

Transform is where it becomes much more tricky. Transformation requires a lot of different capabilities like mapping fields, converting flow into arrays of data or into objects, filtering data, joining tables aggregating data,etc. When you are done with all the available tools your ETL provide, most of them allow to add some custom code to make sure you are not limited.

Talend toolset for transforming data

Load

Finally the LOAD process has the same technical goal of the EXTRACT process: load the prepared data to as many target systems as possible.

How does it fit in your PLM environment?

Migration

The #1 scenario for ETL is migration. I have managed a few migration perfectly with an ETL. Usually the graphical UI and the versioning of your ETL setup will make it possible to explain how the migration flow works without getting too technical.

Keeping in touch with legacy systems and files

I have used ETLs several times to make sure legacy systems could still be integrated to the new solution we were provided. This is usually where we work the most with extracting/inserting data in databases or even playing with files. ETLs often have this cool feature which allow to look for any change in a folder. So whenever a new file appear it can trigger an ETL flow.

Talends tools for triggering a flow on a change

Connecting authoring tools and a central data

The long term use-case for an ETL is the connection with a larger enterprise system like an ESB (Enterprise Service Bus) which I will describe in a future blog post. The goal is to have a central system which will manage the different data sources and connect triggers and data on a single bus. The connection between this bus and any other system would be handled with an ETL allowing to standardize as much as possible the data on the Bus.

The Risk !

The one risk with ETL is to start creating too many one-to-one connections. It becomes complicated to maintain at some point. Depending on the context it might suit you very well because you need to keep these integrations independent in their evolution. But the bigger the system becomes the more you will need to look for a better organized system using an ESB.

Sample use-cases

  • Retrieve part and labor cost from various ERP to inject in your change management process in order to give the right cost saving information to the cost engineer.
  • Replace one software which was reading files from another manufacturing software using an ETL to transform the input file in to web-service calls to the new system.
  • Migrating data from Excel files and Access database to fill the newly deployed PLM solution.
  • Synchronizing an engineering BOM from one PLM to another
  • Synchronizing a Problem Report listing in a PLM from a bug tracker solution like Mantis, jira or other.

Some existing solutions you can download:

I have found everything I wanted using Talend. Haven’t tried others except clover ETL a few years back.

Here is a great video introducing to ETL

Related Articles

db-engines.com : selecting your data storage layer
Databases

db-engines.com : selecting your data storage layer

Data is the essence of most applications and this is particularly true for PLM. How you store the data is a key aspect of your PLM application. It will define how much data you...

Posted on by Yoann Maingon
Should you care about the programing language used in your PLM stack?
Software solutions

Should you care about the programing language used in your PLM stack?

What is the language your PLM solution has been built with? It is something that barely comes up in PLM evaluation. Does it matter? I think so, but in order to know why it matters...

Posted on by Yoann Maingon