neo4j-graph-algorithms
neo4j-graph-algorithms copied to clipboard
algo.randomWalk.stream does not generate paths properly
When algo.randomWalk.stream is called with 'mode' parameter set to 'node2vec' and 'return' and/or 'inOut' parameters not equal to 1.0, the returned paths do not seem to be generated properly by the probalistic distribution defined by the parameters.
I checked the source codes and found the probable reason. Two functions generating the next node have the opposite order of evaluation of incoming and outgoing edges. On the one hand the statement
graph.forEachRelationship(currentNodeId, Direction.BOTH, consumer);
in the function NodeWalker.Node2VecStrategy::buildProbabilityDistribution evaluates the incoming edges first and then the outgoing edges, but on the other hand
graph.getTarget(currentNodeId, neighbourIndex, Direction.BOTH);
in NodeWalker.Node2VecStrategy::getNextNode works in the opposite order, i.e. the outgoing edges first and then the incoming edges.
Thanks a lot for your feedback. Would you mind changing the getTarget function locally and retesting after a build to see if it's fixed?
if you want to we would be happy to take your pull request, otherwise we can also apply the change.
I will change the code, test, and if no problem occurs, make a pull request.
I have a question. Which branch should I use as the base branch?
3.4 is fine. We would cherry-pick it over. Thanks !!
I tried some cypher queries for tests and noticed some behavior which seems unnatural to me. When graph:'heavy' is set, algo.randomWalk.stream treats the graph as undirected, but when graph:'cypher' is set, the graph is considered as directed.
For example, I defined the graph as follows:
MERGE (n1:N { name: 'A' })
MERGE (n2:N { name: 'B' })
MERGE (n1)-[:R]->(n2)
And for the query
CALL algo.randomWalk.stream(NULL, 1, 1000)
YIELD nodeIds
WITH nodeIds, COUNT(nodeIds) AS cnt
RETURN EXTRACT(n in nodeIds | algo.getNodeById(n).name) AS names, cnt
I got
names | cnt |
---|---|
["B", "A"] | 500 |
["A", "B"] | 500 |
(with graph-algorithms-algo-3.4.8.0.jar before my fix). But for
CALL algo.randomWalk.stream(NULL, 1, 1000, {
nodeQuery: "MATCH (p:N) RETURN id(p) as id",
relationshipQuery: "MATCH (p1:N)-[:R]->(p2:N) RETURN id(p1) as source, id(p2) as target",
graph: "cypher"
})
YIELD nodeIds
WITH nodeIds, COUNT(nodeIds) AS cnt
RETURN EXTRACT(n in nodeIds | algo.getNodeById(n).name) AS names, cnt
the result became
names | cnt |
---|---|
["B"] | 500 |
["A", "B"] | 500 |
Is this what you really intend?