pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Add set/list predicates for GFQL (Cypher label set parity)

Open lmeyerov opened this issue 1 month ago • 1 comments

Summary

Add generic list/set predicates to GFQL to cover Cypher label-set semantics (e.g., meaning edge must have labels abc AND xyz). Today we only have scalar predicates (, , etc.) and workaround is custom queries or precomputed columns.

Motivation

  • Cypher allows multiple labels on a relationship/node (). Translating to GFQL currently requires custom or pre-baked flags.
  • We need first-class predicates for list/set columns to avoid ad-hoc lambdas and to expose clean wire protocol representations.

Proposal

Add generic predicates for list/set membership:

  • :
  • :
  • :
  • : every element in appears in the list column (multiplicity ignored)
  • : any element in appears in the list column

Wire protocol (JSON) analogues:

Examples

  • Cypher → GFQL: or if strict.
  • Cypher → .
  • Filter paths to edges that include any of {http, grpc} in a protocols column: .

Scope

  • Add predicate constructors (Python API) + AST types
  • Wire protocol support and docs (spec + quick reference)
  • Optional: list-aware engine implementation for pandas/cuDF columns (lists/sets of strings)

Open questions

  • Should respect multiplicity/order or treat as set? (proposal: treat as set for now)
  • Should we normalize list-like column types (tuples/lists/sets) before evaluating?
  • Do we want a shorthand for label columns (e.g., ), or just the generic predicates?

lmeyerov avatar Nov 30 '25 05:11 lmeyerov

Implementation note: choose predicates that stay vectorized for pandas/cuDF (avoid Python loops). For list/set columns, aim for built-ins like cudf // equivalents and pandas avoidance—ideally via +groupby or -style ops—so both engines stay columnar.

lmeyerov avatar Nov 30 '25 05:11 lmeyerov