neo4j-graph-algorithms icon indicating copy to clipboard operation
neo4j-graph-algorithms copied to clipboard

ArrayIndexOutOfBoundsException while running shortestPath() on a large graph

Open aalampally opened this issue 6 years ago • 5 comments

I am getting a weird error when using shortestPath algorithms in graph-algorithms. In the graph data I have Person node with primaryEmail as once of the attributes. This contains emails of multiple domains. e.g., [email protected], [email protected] etc..

When I run the following query MATCH (source:Person{primaryEmail:'[email protected]'}), (target:Person{primaryEmail:'[email protected]'}) CALL algo.shortestPath.stream(source, target, "cost",{ nodeQuery:'MATCH(n:Person) RETURN id(n) as id', relationshipQuery:'MATCH(n:Person)-[r:connected_to]-(m:Person) RETURN id(n) as source, id(m) as target, (1.0/(r.weight+1.0)) as weight', graph:'cypher'}) YIELD nodeId, cost MATCH (other:Person) WHERE id(other) = nodeId RETURN other.name, cost

It is giving me the following error: Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure 'algo.shortestPath.stream': Caused by: java.lang.ArrayIndexOutOfBoundsException: -1073739909

Just as a tweak, I tried to limit the nodes to a particular domain of email and it started running.The updated query is: MATCH (source:Person{primaryEmail:'[email protected]'}), (target:Person{primaryEmail:'[email protected]'}) CALL algo.shortestPath.stream(source, target, "cost",{ nodeQuery:'MATCH(n:Person) where n.primaryEmail ends with @xyz.com RETURN id(n) as id', relationshipQuery:'MATCH(n:Person)-[r:connected_to]-(m:Person) RETURN id(n) as source, id(m) as target, (1.0/(r.weight+1.0)) as weight', graph:'cypher'}) YIELD nodeId, cost MATCH (other:Person) WHERE id(other) = nodeId RETURN other.name, cost

Is the error because the virtual graph which is created by filtering the relationships, creating a graph which is very big and exceeding the memory? Any help is much appreciated!!

aalampally avatar Jun 11 '18 16:06 aalampally

Can you check how many records are returned by the node and relationship projection queries? If you replace the last lines with:

Return count (*)

That should do it

On Mon, Jun 11, 2018, 17:49 Anirudh Alampally [email protected] wrote:

I am getting a weird error when using shortestPath algorithms in graph-algorithms. In the graph data I have Person node with primaryEmail as once of the attributes. This contains emails of multiple domains. e.g., [email protected], [email protected] etc..

When I run the following query MATCH (source:Person{primaryEmail:'[email protected]'}), ( target:Person{primaryEmail:'[email protected]'}) CALL algo.shortestPath.stream(source, target, "cost",{ nodeQuery:'MATCH(n:Person) RETURN id(n) as id', relationshipQuery:'MATCH(n:Person)-[r:connected_to]-(m:Person) RETURN id(n) as source, id(m) as target, (1.0/(r.weight+1.0)) as weight', graph:'cypher'}) YIELD nodeId, cost MATCH (other:Person) WHERE id(other) = nodeId RETURN other.name, cost

It is giving me the following error: Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure 'algo.shortestPath.stream': Caused by: java.lang.ArrayIndexOutOfBoundsException: -1073739909

Just as a tweak, I tried to limit the nodes to a particular domain of email and it started running.The updated query is: MATCH (source:Person{primaryEmail:'[email protected]'}), ( target:Person{primaryEmail:'[email protected]'}) CALL algo.shortestPath.stream(source, target, "cost",{ nodeQuery:'MATCH(n:Person) where n.primaryEmail ends with @xyz.com RETURN id(n) as id', relationshipQuery:'MATCH(n:Person)-[r:connected_to]-(m:Person) RETURN id(n) as source, id(m) as target, (1.0/(r.weight+1.0)) as weight', graph:'cypher'}) YIELD nodeId, cost MATCH (other:Person) WHERE id(other) = nodeId RETURN other.name, cost

Is the error because the virtual graph which is created by filtering the relationships, creating a graph which is very big and exceeding the memory? Any help is much appreciated!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/neo4j-contrib/neo4j-graph-algorithms/issues/659, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAzpMqtc1g9I203SV0rDQP8rTzXyB9kks5t7p-lgaJpZM4UjCiG .

mneedham avatar Jun 11 '18 17:06 mneedham

Hey @mneedham The no.of nodes after node filtering are: 77974 and the no.of relationships after relationship filtering are: 726748.

aalampally avatar Jun 11 '18 22:06 aalampally

I am assuming that creating a virtual graph is the issue here. So, to simplify the query, I created a relationship property "sssp_distance", which is essentially (1.0/(r.weight+1.0)). So, I could now just mention the weightProperty and relationship in which I have to look for.

My updated query would look like this:

MATCH (start:Person{primaryEmail:'[email protected]'}), (end:Person{primaryEmail:'[email protected]'}) CALL algo.shortestPath.stream(start, end, "sssp_distance", {relationshipQuery: "connected_to"}) YIELD nodeId, cost MATCH (other) WHERE id(other) = nodeId RETURN other, cost

Even after this simplification, I get the following error Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure 'algo.shortestPath.stream': Caused by: java.lang.ArrayIndexOutOfBoundsException: -2147482401

So, I tried APOC's dijkstra algorithm and it was able to give the shortest path using the following query: MATCH (start:Person{primaryEmail: '[email protected]'}), (end:Person {primaryEmail: '[email protected]'}) CALL apoc.algo.dijkstra(start, end, "connected_to", "sssp_distance", 1.0, 1) YIELD path, weight RETURN path, weight

The main reason, I chose graph-algorithms over APOC was that, I could create a virtual graph, this way I wouldn't have to create a new relationship property sssp_distance.

Please Let me know if I still have any workaround for this

aalampally avatar Jun 11 '18 22:06 aalampally

@aalampally seeing as you got the error when using the non Cypher projection approach it sounds like the error is happening somewhere else.

Are you able to narrow the error down on a smaller dataset that you'd be able to share with us so we can reproduce it and figure out what's going on?

mneedham avatar Jun 12 '18 13:06 mneedham

I have tried the same on a smaller dummy dataset CREATE (a:Loc{name:'A'}), (b:Loc{name:'B'}), (c:Loc{name:'C'}), (d:Loc{name:'D'}), (e:Loc{name:'E'}), (f:Loc{name:'F'}), (a)-[:ROAD {cost:50}]->(b), (a)-[:ROAD {cost:50}]->(c), (a)-[:ROAD {cost:100}]->(d), (a)-[:RAIL {cost:50}]->(d), (b)-[:ROAD {cost:40}]->(d), (c)-[:ROAD {cost:40}]->(d), (c)-[:ROAD {cost:80}]->(e), (d)-[:ROAD {cost:30}]->(e), (d)-[:ROAD {cost:80}]->(f), (e)-[:ROAD {cost:40}]->(f), (e)-[:RAIL {cost:20}]->(f);

It works fine both with Cypher projection and non-Cypher projection. I am assuming that if similar data is multiplied to ~0.7 million edges and 70k nodes (data size similar to what I have), the error could be reproduced.

aalampally avatar Jun 13 '18 10:06 aalampally