pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

[FEA] Optimize collapse

Open lmeyerov opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe.

Current collapse code is slow on big graphs, e.g., seconds/minutes 1K-100K edges

Describe the solution you'd like

  • closer to O(n logn) asymptotics (for # nodes + # edges), such as O(n d^2) where d is the longest collapsed path
  • vectorized
  • avoid current re-running per kv pair
  • decompose into general primitives

Some thoughts:

  • do marking phase:
    • identify nodes by collapse group id: same match + adjacent
    • handle all equiv classes at same time
    • ex: pregel-style propagation alg, running to fixpoint (d iterations)
    • classify edges as remaining vs reduction group (start equiv class -> end equiv class)
  • do a separate collapsing (reductions) phase to aggregate matches
    • collapse matched nodes, + compute any desired aggregates
    • collapse edges, + compute any desired aggregates
  • do a separate rewiring

Ideally do various phases as reusable & swappable standalone primitives. E.g., look at https://github.com/graphistry/pygraphistry/issues/193 .

lmeyerov avatar Apr 23 '22 15:04 lmeyerov

  • @silkspace for visibility

lmeyerov avatar Apr 23 '22 15:04 lmeyerov