kgtk
kgtk copied to clipboard
Paths generation filtering edges by qualifiers
Is your feature request related to a problem? Please describe.
I'm trying to reproduce the Cyper query bellow using kgtk
WITH 2010 AS since MATCH rels= (p1:Person) - [:FRIENDS_WITH*1..2]->(p2:Person) WHERE ALL(k1 in relationships(rels) WHERE k1.date_of_start > since) RETURN rels;
Describe the solution you'd like
Since kypher does not support path patterns queries, I thought kgtk paths command should have an optional parameter to apply a filter on paths edges
Describe alternatives you've considered
I first filtered the input graph based on the edges labels. After I used kgtk paths to generate all possible paths between two nodes.
kgtk query -i $GRAPH --match '()-[:FRIENDS_WITH]->()' -o $FRIENDS kgtk paths --max_hops 2 --path-file pairs.tsv --path-mode NONE --path-source source --path-target target -i $FRIENDS -o $PATHS --statistics-only BUT I was not able to filter the result based on edges qualifiers.
I changed my approach to first filter the edges using both labels and properties and then use paths command. It worked for my use case.
kgtk query -i $GRAPH --match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), (pathEdgex)-[:date_of_start]->(vx)' --where 'vx > "^2010-01-01"' --return 'pathEdgex, n1, pathEdgex.label , n2' -o query_path.tsv
kgtk paths --max_hops 2 --path-file pairs.tsv --path-mode NONE --path-source source --path-target target -i query_path.tsv -o $PATHS --statistics-only
Thanks Veronica for answering your own question, and sorry for my delayed response. Path expressions are still on the TO DO list, unfortunately, but they will become available eventually with Kypher. Here is another way to think about this. Since you only have one or two-step paths, you can formulate the second step as an optional match. Below I am using a new --multi edge feature that hasn't been released yet, but you might be able to make this work for you even without that. For example:
> kgtk query -i friends.tsv
id node1 label node2
e1 Fred FRIENDS_WITH Joe
e2 Joe FRIENDS_WITH Susi
e3 Susi FRIENDS_WITH Tom
e4 Fred FRIENDS_WITH Tom
e5 Joe FRIENDS_WITH Otto
e6 e1 date_of_start ^2011-01-01
e7 e2 date_of_start ^2011-01-01
e8 e3 date_of_start ^2009-01-01
e9 e4 date_of_start ^2011-01-01
e10 e5 date_of_start ^2009-01-01
This uses the --multi feature to output the second edge if it does not contain NULLs (should become available soon):
> kgtk query -i friends.tsv \
--match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), \
(pathEdgex)-[:date_of_start]->(vx)' \
--where 'vx > "^2010-01-01"' \
--opt '(n2)-[pathEdgex2:FRIENDS_WITH]->(n3), \
(pathEdgex2)-[:date_of_start]->(vx2)' \
--where 'vx2 > "^2010-01-01"' \
--return 'pathEdgex, n1, pathEdgex.label , n2, pathEdgex2, n1, pathEdgex2.label, n3' \
--multi 2
id node1 label node2
e1 Fred FRIENDS_WITH Joe
e2 Fred FRIENDS_WITH Susi
e2 Joe FRIENDS_WITH Susi
e4 Fred FRIENDS_WITH Tom
But even without that, you might be able to convert the following result for your purposes:
> kgtk query -i friends.tsv \
--match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), \
(pathEdgex)-[:date_of_start]->(vx)' \
--where 'vx > "^2010-01-01"' \
--opt '(n2)-[pathEdgex2:FRIENDS_WITH]->(n3), \
(pathEdgex2)-[:date_of_start]->(vx2)' \
--where 'vx2 > "^2010-01-01"' \
--return 'pathEdgex, n1, pathEdgex.label , n2, pathEdgex2, n1, pathEdgex2.label, n3'
id node1 label node2 id node1 label node2
e1 Fred FRIENDS_WITH Joe e2 Fred FRIENDS_WITH Susi
e2 Joe FRIENDS_WITH Susi Joe
e4 Fred FRIENDS_WITH Tom Fred