neo4j-graph-algorithms icon indicating copy to clipboard operation
neo4j-graph-algorithms copied to clipboard

algo.randomWalk.stream does not generate paths properly

Open fujitatetsu opened this issue 6 years ago • 4 comments

When algo.randomWalk.stream is called with 'mode' parameter set to 'node2vec' and 'return' and/or 'inOut' parameters not equal to 1.0, the returned paths do not seem to be generated properly by the probalistic distribution defined by the parameters.

I checked the source codes and found the probable reason. Two functions generating the next node have the opposite order of evaluation of incoming and outgoing edges. On the one hand the statement

graph.forEachRelationship(currentNodeId, Direction.BOTH, consumer);

in the function NodeWalker.Node2VecStrategy::buildProbabilityDistribution evaluates the incoming edges first and then the outgoing edges, but on the other hand

graph.getTarget(currentNodeId, neighbourIndex, Direction.BOTH);

in NodeWalker.Node2VecStrategy::getNextNode works in the opposite order, i.e. the outgoing edges first and then the incoming edges.

fujitatetsu avatar Jan 09 '19 15:01 fujitatetsu

Thanks a lot for your feedback. Would you mind changing the getTarget function locally and retesting after a build to see if it's fixed?

if you want to we would be happy to take your pull request, otherwise we can also apply the change.

jexp avatar Jan 09 '19 22:01 jexp

I will change the code, test, and if no problem occurs, make a pull request.

I have a question. Which branch should I use as the base branch?

fujitatetsu avatar Jan 11 '19 04:01 fujitatetsu

3.4 is fine. We would cherry-pick it over. Thanks !!

jexp avatar Jan 12 '19 00:01 jexp

I tried some cypher queries for tests and noticed some behavior which seems unnatural to me. When graph:'heavy' is set, algo.randomWalk.stream treats the graph as undirected, but when graph:'cypher' is set, the graph is considered as directed.

For example, I defined the graph as follows:

MERGE (n1:N { name: 'A' })
MERGE (n2:N { name: 'B' })
MERGE (n1)-[:R]->(n2)

And for the query

CALL algo.randomWalk.stream(NULL, 1, 1000)
YIELD nodeIds
WITH nodeIds, COUNT(nodeIds) AS cnt
RETURN EXTRACT(n in nodeIds | algo.getNodeById(n).name) AS names, cnt

I got

names cnt
["B", "A"] 500
["A", "B"] 500

(with graph-algorithms-algo-3.4.8.0.jar before my fix). But for

CALL algo.randomWalk.stream(NULL, 1, 1000, {
  nodeQuery: "MATCH (p:N) RETURN id(p) as id",
  relationshipQuery: "MATCH (p1:N)-[:R]->(p2:N) RETURN id(p1) as source, id(p2) as target",
  graph: "cypher"
})
YIELD nodeIds
WITH nodeIds, COUNT(nodeIds) AS cnt
RETURN EXTRACT(n in nodeIds | algo.getNodeById(n).name) AS names, cnt

the result became

names cnt
["B"] 500
["A", "B"] 500

Is this what you really intend?

fujitatetsu avatar Jan 14 '19 17:01 fujitatetsu