mage icon indicating copy to clipboard operation
mage copied to clipboard

[BUG] labelFilter for subgraph_all doesn't work as documented

Open thomas-tran-de opened this issue 8 months ago • 6 comments

Describe the bug In the documentation, examples for subgraph_all are given. Unfortunately, they don't work as described.

To Reproduce Run the following query:

CREATE (w:Wolf)-[ca:CATCHES]->(d:Dog), (c:Cat), (m:Mouse), (h:Human);
MATCH (w:Wolf), (d:Dog), (c:Cat), (m:Mouse), (h:Human)
WITH w, d, c, m, h
CREATE (d)-[:CATCHES]->(c)
CREATE (c)-[:CATCHES]->(m)
CREATE (d)-[:FRIENDS_WITH]->(m)
CREATE (h)-[:OWNS]->(d)
CREATE (h)-[:HUNTS]->(w)
CREATE (h)-[:HATES]->(m);
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {
    labelFilter: ["/Mouse"]
})
YIELD nodes AS nodes, rels AS rels
RETURN nodes, rels

This returns only a single node, Mouse:

{
  "nodes": [
    { "id": 13, "labels": ["Mouse"], "properties": {}, "type": "node" }
  ],
  "rels": []
}

Expected behavior According to the documentation, this filtering will return 3 paths: Dog->Cat->Mouse, Dog<-Human->Mouse, Dog->Mouse.

Additional context I tried this in the newest Memgraph version 3.2.1.

thomas-tran-de avatar May 13 '25 08:05 thomas-tran-de

Thank you @thomas-tran-de for reporting the issue. I managed to reproduce it and can confirm that the same behavior happens to me as well, and it is opposite to what's described.

Are you blocked by this currently?

katarinasupe avatar May 13 '25 12:05 katarinasupe

Thanks for the quick reply @katarinasupe. No, I'm not blocked by this. I came up with a workaround for my specific use case which isn't as elegant but leads to the same result. I'll switch back to my initial idea when path.subgraph_all is fixed.

thomas-tran-de avatar May 13 '25 12:05 thomas-tran-de

Hi @katarinasupe, this just got more important to me.

I went through the label filter examples in the documentation and none of them work.

Could the issue be AreLabelsValid (Link)? It looks too restrictive. Whenever a termination or end node is used, only those will be returned. This can be illustrated by specifying all intermediate nodes as end nodes:

Starting Node Label Filter Paths Returned Paths Expected Workaround Filter
Dog ["/Mouse"] Mouse Dog->Cat->Mouse, Dog<-Human->Mouse, Dog->Mouse [">Dog", "/Mouse", ">Cat", ">Human"]
Dog ["/Mouse", "Cat"] None Dog->Mouse, Dog->Cat->Mouse [">Dog", "/Mouse", ">Cat"]
Dog ["/Mouse", "-Cat", "-Human"] Mouse Dog->Mouse [">Dog", "/Mouse"]
Cat [">Dog", "+Human", "+Wolf"] None Cat<-Dog, Cat<-Dog<-Human->Wolf->Dog [">Cat", ">Dog", ">Human", ">Wolf"]

Maybe !config_.label_bools_status.termination_activated && !config_.label_bools_status.end_node_activated && Whitelisted(label_bools.whitelisted) just needs to be replaced with Whitelisted(label_bools.whitelisted)? As a user, I expect the whitelist to work independent of termination and end nodes.

thomas-tran-de avatar May 22 '25 09:05 thomas-tran-de

Two more issues I just encountered:

  1. The default for filterStartNode is true contrary to what the documentation says.
  2. Even when setting filterStartNode to false, the graph stops expanding at the initial node if it's not explicitly whitelisted. This issue is likely because ContinueExpanding doesn't handle filterStartNode at all. It is called in BFS for all nodes.

To reproduce, use the example graph from the documentation.

Query Result Expected
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {labelFilter: ["Cat"]})
YIELD nodes, rels
RETURN nodes, rels
nothing Dog->Cat
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {labelFilter: ["Cat"], filterStartNode: false})
YIELD nodes, rels
RETURN nodes, rels
Dog Dog->Cat
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {labelFilter: ["Cat", "Dog"]})
YIELD nodes, rels
RETURN nodes, rels
Dog->Cat Dog->Cat
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {labelFilter: ["Cat", "-Dog"]})
YIELD nodes, rels
RETURN nodes, rels
nothing Dog->Cat
MATCH (dog:Dog)
CALL path.subgraph_all(dog, {labelFilter: ["Cat", "-Dog"], filterStartNode: false})
YIELD nodes, rels
RETURN nodes, rels
Dog Dog->Cat

This is now blocking me from further development. Do you know when someone has time to look into this, @katarinasupe?

thomas-tran-de avatar May 23 '25 12:05 thomas-tran-de

Hi @thomas-tran-de, I've forwarded the info and the team is aware. This is currently not a priority and they will get back to you as soon as they can take a deeper look.

katarinasupe avatar May 23 '25 13:05 katarinasupe

I did some more investigating. Now that I had a closer look at the code, I realized that the documentation primarily focuses on path.expand, not path.subgraph_*. Here are the key things that threw me off:

  • labelFilter is used to check which paths to traverse but it isn't strictly related to which nodes are returned. If using end or termination nodes, only these will be returned even if other labels are part of the whitelist.
  • The documentation states that "expansion will only go through nodes with whitelisted labels" for end nodes. This is also true for termination nodes.
  • The first node will always be returned if filterStartNode == false. This is not related to expanding the graph though. I found this very confusing and that's why I created a pull request.
  • If relationshipFilter contains an edge going both directions, i.e., <EDGE>, it will always return the corresponding child node regardless of labelFilter.
  • Like labelFilter, relationshipFilter will only determine how to find relevant paths. The returned relationships for subgraph_all are just all relationships between the returned nodes.

thomas-tran-de avatar May 28 '25 07:05 thomas-tran-de