pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

GFQL: cuDF wavefront executor w/ planner

Open lmeyerov opened this issue 1 month ago • 0 comments

Why

Ship a GPU-native GFQL executor that:

  • preserves existing local-only semantics (no WHERE), and
  • adds same-path multihop WHERE support, while remaining set-based (returning nodes/edges, not paths), exact, and fast.

Deliverables

  • Forward/backward/forward executor in cuDF that:
    • Implements existing GFQL local predicates for linear chains under set semantics (no paths returned)
    • Adds WHERE lowering for same-path predicates:
      • Inequalities: per-node min_/max_ summaries
      • Equality/!= small domain: per-node bitsets (bs0..bsK lanes)
      • Equality/!= large domain: bounded (node,value) state table with strict caps per step
  • Planner switches (“pay-as-you-go”):
    • When no WHERE clause is present: execute pure F/B/F without min/max/bitsets/state tables
    • When inequalities present: enable min/max only for referenced names
    • When equality/!= present: pick bitsets vs state tables per referenced name
    • Hybrid hop selection: choose sparse CSR/CSC gathers when |frontier| × avg_degree << |E|, else cuDF joins
  • Null/NaN policy implemented once (SQL semantics)
  • Unit tests aligned with the enumerator oracle (pandas + cuDF)

Implementation Sketch

  • Forward: join edges with frontier, push filters, aggregate min/max and OR bitsets via groupby('dst') (Numba apply_grouped)
  • Equality fallback: state-table propagation via merges + drop_duplicates()
  • Backward: reverse predicates, intersect with target frontier, prune
  • Checks: inequalities compare against min/max; equality/!= uses bit tests or joins against state table

Testing & Acceptance

  • Exact match vs enumerator on small graphs for pandas & cuDF
  • Metamorphic checks: isolated nodes no-op, tightening predicates shrinks results, AND commutes
  • Perf sanity on high-fanout synthetic chains; bitset OR kernels run on GPU

References

  • GraphFrames motif baseline for comparison
  • Kùzu factorized + sideways information passing ideas for SIP/adjacency layout
  • GPU Datalog/GraphBLAST for future kernelization intuition

lmeyerov avatar Nov 17 '25 08:11 lmeyerov