Software solutions Archives

At Ganister, one of our customers reached out to us last week with a question about its data. This is one of our launching customer who knows the power of the graph. They don’t have anyone yet who knows the cypher language to query their own graph database but they know it is very efficient when it comes to turning a question into valuable data.

The Context

The company is building systems. It contains a lot of electronic and has a lot of potential integrations on vehicles, buildings, etc. Their product is managed by Families and Systems.

They have about 7300 parts, 124 systems and 35 families. The average depth of a bill of material is 7 levels and it contains from 300 to 600 part occurrences per system.

The Question

“Hey Ganister (I believe it could become something real soon with all these personal assistant technologies !) can you tell me which part used by family XXX are not used by any other families?”

The Design

Let’s represent a set of a the data

The basic process is:

start from the Family XXX
Look for all the parts

Remove parts that are related to other Families

We end up with the 5 parts on the left side of the graph.

The Cypher

The first idea was to find all the parts connected to the family and add a where clause to filter the ones that have relationships with Families other than XXX.

MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part),(n2:family)
WHERE NOT (n2)-[:programSystem|systemPart|consumes*]->(p1:part) AND n2._ref='XXX'
RETURN p1

It looked good and quite simple but it was not very efficient. Efficiency is usually related to the number of database hits required to find the correct answers.

Then we believed it would be easier to list parts from family XXX, parts from other families and diff the two sets of data.

MATCH (n1:family{_ref:'XXX'})-[:programSystem|systemPart|consumes*]->(p1:part)
WITH collect(DISTINCT p1._ref) as P1
MATCH (n2:family)-[:programSystem|systemPart|consumes*]->(p2:part)
WHERE n2._ref <>'XXX'
WITH P1,collect(DISTINCT p2._ref) as P2
UNWIND apoc.coll.subtract(P1, P2) as res
RETURN res

The Query Planner

As mentioned we had a first way of doing it which wasn’t very efficient. The first way to figure out it is not efficient is based on the human feeling => “Hummm I don’t think it should take this much time !”. Then, Neo4j provides a nice way to understand the efficiency of the query by prefixing your query with the word PROFILE.

Here is the result of the first cypher query which looked simpler but resulted in almost 48 millions db hits.

The second query which is diffing two sets of results, generates only 700k db hits

This is not a query we have to run many times, therefore we did not spend time improving the performance.

The Result

The first great result was customer’s satisfaction to get such precise result within a very small amount of time. The query takes about 300ms to run at first and 200ms the next times because of cache mechanisms from Neo4J. But the main success for us is proving that these types of questions about a customer data can be answered by graph database technologies much better than other types of databases.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

PLMStack