neo4j-graph-algorithms Jaccard Similarity doesn't work with concurrency

Problem When running the Jaccard similarity algorithm over a list of node and categories entries all the similarities are 0 when run without concurrency limit set to 1.

Environment Docker image running Neo4j 3.5.3 and graph algorithms 3.5.3.3, memory is limited to 16G, cpu's are unbound (192 cpu's in the machine, shared with other processes)

Setup

MERGE (french:Cuisine {name:'French'})
MERGE (italian:Cuisine {name:'Italian'})
MERGE (indian:Cuisine {name:'Indian'})
MERGE (lebanese:Cuisine {name:'Lebanese'})
MERGE (portuguese:Cuisine {name:'Portuguese'})

MERGE (zhen:Person {name: "Zhen"})
MERGE (praveena:Person {name: "Praveena"})
MERGE (michael:Person {name: "Michael"})
MERGE (arya:Person {name: "Arya"})
MERGE (karin:Person {name: "Karin"})

MERGE (praveena)-[:LIKES]->(indian)
MERGE (praveena)-[:LIKES]->(portuguese)

MERGE (zhen)-[:LIKES]->(french)
MERGE (zhen)-[:LIKES]->(indian)

MERGE (michael)-[:LIKES]->(french)
MERGE (michael)-[:LIKES]->(italian)
MERGE (michael)-[:LIKES]->(indian)

MERGE (arya)-[:LIKES]->(lebanese)
MERGE (arya)-[:LIKES]->(italian)
MERGE (arya)-[:LIKES]->(portuguese)

MERGE (karin)-[:LIKES]->(lebanese)
MERGE (karin)-[:LIKES]->(italian)

Queries

MATCH (b:Person)-[v:LIKES]->(c:Cuisine)
WITH {item:id(b), categories: collect(id(c))} as vacatureData limit 50000
WITH collect(vacatureData) as data

CALL algo.similarity.jaccard(data, {concurrency:1, similarityCutoff:0.1})
YIELD nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95
RETURN nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95

results in

╒═══════╤═════════════════╤═══════════════════╤══════════════════╤═══════════════════╤══════════════════╤══════════════════╤═══════════════════╤══════════════════╤══════════════════╕
│"nodes"│"similarityPairs"│"min"              │"max"             │"mean"             │"p25"             │"p50"             │"p75"              │"p90"             │"p95"             │
╞═══════╪═════════════════╪═══════════════════╪══════════════════╪═══════════════════╪══════════════════╪══════════════════╪═══════════════════╪══════════════════╪══════════════════╡
│5      │7                │0.19999980926513672│0.6666669845581055│0.37380967821393696│0.2500009536743164│0.2500009536743164│0.33333301544189453│0.6666669845581055│0.6666669845581055│
└───────┴─────────────────┴───────────────────┴──────────────────┴───────────────────┴──────────────────┴──────────────────┴───────────────────┴──────────────────┴──────────────────┘

removing the concurrency limit

MATCH (b:Person)-[v:LIKES]->(c:Cuisine)
WITH {item:id(b), categories: collect(id(c))} as vacatureData limit 50000
WITH collect(vacatureData) as data

CALL algo.similarity.jaccard(data,  {similarityCutoff:0.1})
YIELD nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95
RETURN nodes, similarityPairs, min, max, mean, p25, p50, p75, p90, p95

results in

╒═══════╤═════════════════╤═════╤═════╤══════╤═════╤═════╤═════╤═════╤═════╕
│"nodes"│"similarityPairs"│"min"│"max"│"mean"│"p25"│"p50"│"p75"│"p90"│"p95"│
╞═══════╪═════════════════╪═════╪═════╪══════╪═════╪═════╪═════╪═════╪═════╡
│5      │0                │0.0  │0.0  │0.0   │0.0  │0.0  │0.0  │0.0  │0.0  │
└───────┴─────────────────┴─────┴─────┴──────┴─────┴─────┴─────┴─────┴─────┘

Setting the concurrency to any number except for 1 results in the latter case. The same behaviour is observed when running with our 300k nodes Jaccard computation.

Apr 16 '19 13:04 JorenVdV

Hey,

I'll take a look at it. I've seen this happen sporadically, but not been able to figure out exactly why it happens as it doesn't happen every time annoyingly.

e.g. I just tested this on a Docker image and it gives the same results with concurrency 1 and concurrency > 1.

Cheers, Mark

Apr 17 '19 14:04 mneedham

Any resolution on this? Still not able to use > 1 core with algo.similarity.jaccard. I'm running 3.5.8 EE.

Mar 09 '20 22:03 d-kilc

Please check the https://github.com/neo4j/graph-data-science as it has improved graph algorithms, and it is also the successor for the graph algorithms library

Mar 10 '20 08:03 tomasonjo

neo4j-graph-algorithms neo4j-graph-algorithms copied to clipboard

Jaccard Similarity doesn't work with concurrency

neo4j-graph-algorithms
neo4j-graph-algorithms copied to clipboard