kgtk icon indicating copy to clipboard operation
kgtk copied to clipboard

Paths generation filtering edges by qualifiers

Open versant2612 opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe.

I'm trying to reproduce the Cyper query bellow using kgtk

WITH 2010 AS since MATCH rels= (p1:Person) - [:FRIENDS_WITH*1..2]->(p2:Person) WHERE ALL(k1 in relationships(rels) WHERE k1.date_of_start > since) RETURN rels;

Describe the solution you'd like

Since kypher does not support path patterns queries, I thought kgtk paths command should have an optional parameter to apply a filter on paths edges

Describe alternatives you've considered

I first filtered the input graph based on the edges labels. After I used kgtk paths to generate all possible paths between two nodes.

kgtk query -i $GRAPH --match '()-[:FRIENDS_WITH]->()' -o $FRIENDS kgtk paths --max_hops 2 --path-file pairs.tsv --path-mode NONE --path-source source --path-target target -i $FRIENDS -o $PATHS --statistics-only BUT I was not able to filter the result based on edges qualifiers.

versant2612 avatar Jan 28 '22 04:01 versant2612

I changed my approach to first filter the edges using both labels and properties and then use paths command. It worked for my use case.

kgtk query -i $GRAPH --match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), (pathEdgex)-[:date_of_start]->(vx)' --where 'vx > "^2010-01-01"' --return 'pathEdgex, n1, pathEdgex.label , n2' -o query_path.tsv

kgtk paths --max_hops 2 --path-file pairs.tsv --path-mode NONE --path-source source --path-target target -i query_path.tsv -o $PATHS --statistics-only

versant2612 avatar Feb 06 '22 16:02 versant2612

Thanks Veronica for answering your own question, and sorry for my delayed response. Path expressions are still on the TO DO list, unfortunately, but they will become available eventually with Kypher. Here is another way to think about this. Since you only have one or two-step paths, you can formulate the second step as an optional match. Below I am using a new --multi edge feature that hasn't been released yet, but you might be able to make this work for you even without that. For example:

> kgtk query -i friends.tsv
id	node1	label	node2
e1	Fred	FRIENDS_WITH	Joe
e2	Joe	FRIENDS_WITH	Susi
e3	Susi	FRIENDS_WITH	Tom
e4	Fred	FRIENDS_WITH	Tom
e5	Joe	FRIENDS_WITH	Otto
e6	e1	date_of_start	^2011-01-01
e7	e2	date_of_start	^2011-01-01
e8	e3	date_of_start	^2009-01-01
e9	e4	date_of_start	^2011-01-01
e10	e5	date_of_start	^2009-01-01

This uses the --multi feature to output the second edge if it does not contain NULLs (should become available soon):

> kgtk query -i friends.tsv \
     --match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), \
              (pathEdgex)-[:date_of_start]->(vx)' \
     --where 'vx > "^2010-01-01"' \
     --opt   '(n2)-[pathEdgex2:FRIENDS_WITH]->(n3), \
              (pathEdgex2)-[:date_of_start]->(vx2)' \
     --where 'vx2 > "^2010-01-01"' \
     --return 'pathEdgex, n1, pathEdgex.label , n2, pathEdgex2, n1, pathEdgex2.label, n3' \
     --multi 2
id	node1	label	node2
e1	Fred	FRIENDS_WITH	Joe
e2	Fred	FRIENDS_WITH	Susi
e2	Joe	FRIENDS_WITH	Susi
e4	Fred	FRIENDS_WITH	Tom

But even without that, you might be able to convert the following result for your purposes:

> kgtk query -i friends.tsv \
     --match '(n1)-[pathEdgex:FRIENDS_WITH]->(n2), \
              (pathEdgex)-[:date_of_start]->(vx)' \
     --where 'vx > "^2010-01-01"' \
     --opt   '(n2)-[pathEdgex2:FRIENDS_WITH]->(n3), \
              (pathEdgex2)-[:date_of_start]->(vx2)' \
     --where 'vx2 > "^2010-01-01"' \
     --return 'pathEdgex, n1, pathEdgex.label , n2, pathEdgex2, n1, pathEdgex2.label, n3'
id	node1	label	node2	id	node1	label	node2
e1	Fred	FRIENDS_WITH	Joe	e2	Fred	FRIENDS_WITH	Susi
e2	Joe	FRIENDS_WITH	Susi		Joe		
e4	Fred	FRIENDS_WITH	Tom		Fred		

chalypso avatar Feb 07 '22 17:02 chalypso