Graph and Cypher for a typical PLM Question
At Ganister, one of our customers reached out to us last week with a question about its data. This is one of our launching customer who knows the power of the graph. They...
At Ganister, one of our customers reached out to us last week with a question about its data. This is one of our launching customer who knows the power of the graph. They don’t have anyone yet who knows the cypher language to query their own graph database but they know it is very efficient when it comes to turning a question into valuable data.
The company is building systems. It contains a lot of electronic and has a lot of potential integrations on vehicles, buildings, etc. Their product is managed by Families and Systems.
They have about 7300 parts, 124 systems and 35 families. The average depth of a bill of material is 7 levels and it contains from 300 to 600 part occurrences per system.
“Hey Ganister (I believe it could become something real soon with all these personal assistant technologies !) can you tell me which part used by family XXX are not used by any other families?”
Let’s represent a set of a the data
The basic process is:
We end up with the 5 parts on the left side of the graph.
The first idea was to find all the parts connected to the family and add a where clause to filter the ones that have relationships with Families other than XXX.
MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part),(n2:family)
WHERE NOT (n2)-[:programSystem|systemPart|consumes*]->(p1:part) AND n2._ref='XXX'
RETURN p1
It looked good and quite simple but it was not very efficient. Efficiency is usually related to the number of database hits required to find the correct answers.
Then we believed it would be easier to list parts from family XXX, parts from other families and diff the two sets of data.
MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part)
WITH collect(DISTINCT p1._ref) as P1
MATCH (n2:family)-[:programSystem|systemPart|consumes*]->(p2:part)
WHERE n2._ref <>'XXX'
WITH P1,collect(DISTINCT p2._ref) as P2
UNWIND apoc.coll.subtract(P1, P2) as res
RETURN res
As mentioned we had a first way of doing it which wasn’t very efficient. The first way to figure out it is not efficient is based on the human feeling => “Hummm I don’t think it should take this much time !”. Then, Neo4j provides a nice way to understand the efficiency of the query by prefixing your query with the word PROFILE.
Here is the result of the first cypher query which looked simpler but resulted in almost 48 millions db hits.
The second query which is diffing two sets of results, generates only 700k db hits
This is not a query we have to run many times, therefore we did not spend time improving the performance.
The first great result was customer’s satisfaction to get such precise result within a very small amount of time. The query takes about 300ms to run at first and 200ms the next times because of cache mechanisms from Neo4J. But the main success for us is proving that these types of questions about a customer data can be answered by graph database technologies much better than other types of databases.
I wish you all a happy new year. My blog colleagues all came up with predictions for either the coming years or like oleg and Jos, for 2030. There are some very interesting...
Following up on my old article about ETL, another interesting piece of software for a PLM stack is the Enterprise Service Bus. Having a Service Bus in any company department...