pygraphistry
pygraphistry copied to clipboard
GFQL: Path predicates & path mode
Why
Expose paths when users request them (pay-as-you-go) while keeping default set semantics fast.
Deliverables
- Syntax:
MATCH PATH p = a->b->c WHERE … RETURN pWHEREuses the same same-path semantics over named steps as non-PATH GFQL queries
- Execution plan:
- Run F/B/F (+ WHERE) to prune the subgraph (set semantics)
- Enumerate paths only on the pruned graph, using sparse gathers per step
- Factorized result container with lazy enumeration, row caps, and streaming
- Optional path predicates (length, simple/non-simple, scoring) that trigger path mode and are not available in non-PATH queries
Optimization Modes & Switches
- Factorize outputs and enumerate on demand (Kùzu-style)
- Use sideways information passing to push semijoin filters and avoid scanning unrelated adjacency/properties
- Maintain CSR/CSC adjacency in device memory; align edge-property columns for sequential reads
Acceptance
- Matches enumerator oracle on small graphs; respects row caps and streaming
- Benchmarks vs DataFrame path-table baseline, reporting intermediates, peak memory, wall time
References
- GraphFrames motif/path-table baseline for opt-in path mode
- Kùzu factorized enumeration + CIDR papers for lazy output strategies