Graph and Cypher for a typical PLM Question

Graph and Cypher for a typical PLM Question


Yoann Maingon
Yoann Maingon
@yoannmaingon
Graph and Cypher for a typical PLM Question

At Ganister, one of our customers reached out to us last week with a question about its data. This is one of our launching customer who knows the power of the graph. They don’t have anyone yet who knows the cypher language to query their own graph database but they know it is very efficient when it comes to turning a question into valuable data.

The Context

The company is building systems. It contains a lot of electronic and has a lot of potential integrations on vehicles, buildings, etc. Their product is managed by Families and Systems.

They have about 7300 parts, 124 systems and 35 families. The average depth of a bill of material is 7 levels and it contains from 300 to 600 part occurrences per system.

The Question

“Hey Ganister (I believe it could become something real soon with all these personal assistant technologies !) can you tell me which part used by family XXX are not used by any other families?”

The Design

Let’s represent a set of a the data

The basic process is:

  • start from the Family XXX
  • Look for all the parts
  • Remove parts that are related to other Families

We end up with the 5 parts on the left side of the graph.

The Cypher

The first idea was to find all the parts connected to the family and add a where clause to filter the ones that have relationships with Families other than XXX.

MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part),(n2:family)
WHERE NOT (n2)-[:programSystem|systemPart|consumes*]->(p1:part) AND n2._ref='XXX'
RETURN p1

It looked good and quite simple but it was not very efficient. Efficiency is usually related to the number of database hits required to find the correct answers.

Then we believed it would be easier to list parts from family XXX, parts from other families and diff the two sets of data.

MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part)
WITH collect(DISTINCT p1._ref) as P1
MATCH (n2:family)-[:programSystem|systemPart|consumes*]->(p2:part)
WHERE n2._ref <>'XXX'
WITH P1,collect(DISTINCT p2._ref) as P2
UNWIND apoc.coll.subtract(P1, P2) as res
RETURN res 

The Query Planner

As mentioned we had a first way of doing it which wasn’t very efficient. The first way to figure out it is not efficient is based on the human feeling => “Hummm I don’t think it should take this much time !”. Then, Neo4j provides a nice way to understand the efficiency of the query by prefixing your query with the word PROFILE.

Here is the result of the first cypher query which looked simpler but resulted in almost 48 millions db hits.

The second query which is diffing two sets of results, generates only 700k db hits

This is not a query we have to run many times, therefore we did not spend time improving the performance.

The Result

The first great result was customer’s satisfaction to get such precise result within a very small amount of time. The query takes about 300ms to run at first and 200ms the next times because of cache mechanisms from Neo4J. But the main success for us is proving that these types of questions about a customer data can be answered by graph database technologies much better than other types of databases.

Related Articles

What is an ESB and why PLM should care?
Software solutions

What is an ESB and why PLM should care?

Following up on my old article about ETL, another interesting piece of software for a PLM stack is the Enterprise Service Bus. Having a Service Bus in any company department...

Posted on by Yoann Maingon
Not predictions, just a PLM wish list for 2020!
Software solutions

Not predictions, just a PLM wish list for 2020!

I wish you all a happy new year. My blog colleagues all came up with predictions for either the coming years or like oleg and Jos, for 2030. There are some very interesting...

Posted on by Yoann Maingon