pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

[FEA] disjunction operator ("or") in chain

Open lmeyerov opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe.

In some use cases, we want to do more than just linear chains, but allow multiple types of nodes, and potentially different path lengths based on them

ex:

  • person -> any alert -> devices
  • person -> fancy alert 123 -> other stuff

Describe the solution you'd like

Several levels of disjunction: value, element, and path

Should do backtracking neo4j-style, not eager gremlin style

Should not break vectorization and overall algorithmic pattern

Describe alternatives you've considered

Some examples showing diff levels of computational strength:

g.chain([ path1, path2, ...]) # or g.chain(path1).union(g.chain(path2))
g.chain([ n({'type': or(['val1', 'val2'])}) ]) # disjunction in values, note use of closed universe of options..
g.chain([ n( or({'type': 'val1'}, {'type': 'val2'}) ) ]) # disjunction in dicts
g.chain([ or(n(...), n(...)) ]) # disjunction at entity level
g.chain([ or([n(), e(), n()], [n(), e(), n(), e(), n()]) ])  # nestable at path level

There might be a wider class of operators of interest than disjunction, so unclear which others to include at each level

Worth investigating cypher vs gremlin here (+ datalog?) for path level

Worth investigating language of predicates, e.g., lists & regex in cudf

Additional context

This can be tricky to implement! Maybe split into 2 issues:

  • Entity level ones may help as can do mostly at hop() etc level: simple, efficient. Bigger question is the language of vectorizable entity/attribute predicates
  • Path level ones, esp nested, which may be harder to linearize in optimized execution

lmeyerov avatar Sep 23 '22 19:09 lmeyerov

Typical case:

  • multiple labels for a node/edge

lmeyerov avatar Oct 11 '22 15:10 lmeyerov

Optional enrichment from 1 path to another:

match p1=(m1)<-[]-(a)-[*]->(b)
with p1
match p2=(m2)<-[]-(a)
return p1, p2

Goal is to have some match p1 and then, if enrichming p2s, , want to see those too

Not simply conjunction / product: if no p2, still want to see p1...

Currently can somewhat be manually done via concat and naming:

g = ...
g1 = g.chain([ ..... n(name='interesting'), ...])
g2a = g.nodes(g._nodes.merge(g1._nodes[ g1._nodes['interesting'] ]).fillna({'interesting': False}))
g2b = g2a.chain(.....)
enriched = concat(g1, g2b)
enriched.plot()

lmeyerov avatar Oct 11 '22 15:10 lmeyerov

Can we have a show and tell session with team about this?

silkspace avatar Oct 11 '22 16:10 silkspace

uh sure

maybe starting point is what already works :) next week can do team umap st (thomas) + query (you) + this (me)?

or maybe your/daniel/thomas/tanmoy stuff, and this the week after?

lmeyerov avatar Oct 12 '22 01:10 lmeyerov