pygraphistry
pygraphistry copied to clipboard
Add set/list predicates for GFQL (Cypher label set parity)
Summary
Add generic list/set predicates to GFQL to cover Cypher label-set semantics (e.g., meaning edge must have labels abc AND xyz). Today we only have scalar predicates (, , etc.) and workaround is custom queries or precomputed columns.
Motivation
- Cypher allows multiple labels on a relationship/node (). Translating to GFQL currently requires custom or pre-baked flags.
- We need first-class predicates for list/set columns to avoid ad-hoc lambdas and to expose clean wire protocol representations.
Proposal
Add generic predicates for list/set membership:
- :
- :
- :
- : every element in appears in the list column (multiplicity ignored)
- : any element in appears in the list column
Wire protocol (JSON) analogues:
Examples
- Cypher → GFQL: or if strict.
- Cypher → .
- Filter paths to edges that include any of {http, grpc} in a protocols column: .
Scope
- Add predicate constructors (Python API) + AST types
- Wire protocol support and docs (spec + quick reference)
- Optional: list-aware engine implementation for pandas/cuDF columns (lists/sets of strings)
Open questions
- Should respect multiplicity/order or treat as set? (proposal: treat as set for now)
- Should we normalize list-like column types (tuples/lists/sets) before evaluating?
- Do we want a shorthand for label columns (e.g., ), or just the generic predicates?
Implementation note: choose predicates that stay vectorized for pandas/cuDF (avoid Python loops). For list/set columns, aim for built-ins like cudf // equivalents and pandas avoidance—ideally via +groupby or -style ops—so both engines stay columnar.