pygraphistry
pygraphistry copied to clipboard
[FEA] Optimize collapse
Is your feature request related to a problem? Please describe.
Current collapse code is slow on big graphs, e.g., seconds/minutes 1K-100K edges
Describe the solution you'd like
- closer to
O(n logn)
asymptotics (for # nodes + # edges), such asO(n d^2)
whered
is the longest collapsed path - vectorized
- avoid current re-running per kv pair
- decompose into general primitives
Some thoughts:
- do marking phase:
- identify nodes by collapse group id: same match + adjacent
- handle all equiv classes at same time
- ex: pregel-style propagation alg, running to fixpoint (d iterations)
- classify edges as remaining vs reduction group (start equiv class -> end equiv class)
- do a separate collapsing (reductions) phase to aggregate matches
- collapse matched nodes, + compute any desired aggregates
- collapse edges, + compute any desired aggregates
- do a separate rewiring
Ideally do various phases as reusable & swappable standalone primitives. E.g., look at https://github.com/graphistry/pygraphistry/issues/193 .
- @silkspace for visibility