Research: GFQL Query Language Interoperability Strategy
Objective
Research and plan a strategy for GFQL interoperability with other graph query languages, with a focus on enabling future embedded implementations (e.g., Rust port).
Background
GFQL currently operates as a standalone graph query language embedded in Python. To maximize adoption and enable future architectural evolution (embedded Rust runtime, cross-language support), we should evaluate interoperability with established graph query standards and languages.
Query Languages to Evaluate
1. Cypher (Neo4j, Memgraph, Amazon Neptune)
- Text-based: Parse Cypher strings and translate to GFQL AST
- BOLT protocol: Native binary protocol support for Neo4j
- Questions:
- Translation fidelity: What Cypher features map cleanly to GFQL?
- Protocol choice: Text parsing vs BOLT wire protocol?
- Performance implications: Client-side translation vs server-side?
- Bidirectional: GFQL → Cypher compilation for remote execution?
2. GQL (ISO/IEC 39075 Standard)
- Status: Emerging international standard for graph query
- Adoption: TigerGraph and other vendors
- Questions:
- Syntax overlap with Cypher?
- Standard compliance benefits?
- Feature gaps vs GFQL?
3. GSQL (TigerGraph)
- Characteristics: Procedural, pattern-matching focused
- Questions:
- Semantic alignment with GFQL's functional composition?
- Translation complexity?
- Value proposition for TigerGraph users?
4. Gremlin (Apache TinkerPop, Neptune, Cosmos)
- Characteristics: Imperative traversal language
- Integration: Already have Gremlin connector (
graphistry.from_gremlin()) - Questions:
- Current connector limitations?
- Gremlin → GFQL AST translation?
- Bytecode support?
Research Questions
Translation Architecture
Option A: Text → GFQL AST
# Parse external query language strings
cypher_query = "MATCH (n:Person)-[:KNOWS]->(m) RETURN n, m"
gfql_ast = graphistry.from_cypher(cypher_query)
g.gfql(gfql_ast)
Option B: Wire Protocol Native
# Use native binary protocols (e.g., BOLT for Cypher)
g.bolt_query("MATCH (n:Person)-[:KNOWS]->(m) RETURN n, m")
Option C: Bidirectional Compilation
# GFQL → External language for remote execution
gfql_query = [n({'type': 'Person'}), e_forward({'type': 'KNOWS'}), n()]
cypher_string = graphistry.to_cypher(gfql_query)
# Execute on remote Neo4j server
Embedded Runtime Considerations
Key Question: How do we design interop to support future embedded GFQL runtime (Rust, WebAssembly, etc.)?
Constraints:
- Translation layer should be thin - avoid heavy Python dependencies
- AST representation should be serializable (already JSON-capable)
- Wire protocols should be language-agnostic
- Parser/compiler infrastructure should be portable
Potential Architecture:
┌─────────────────────────────────────────────────┐
│ Application Layer (Python/JS/etc) │
├─────────────────────────────────────────────────┤
│ Query Language Parsers (Cypher, etc) │
│ ↓ (generates) │
│ GFQL AST (JSON-serializable) │
├─────────────────────────────────────────────────┤
│ GFQL Runtime (Rust/WASM - embedded) │
│ - AST execution │
│ - DataFrame operations │
│ - Optimization │
├─────────────────────────────────────────────────┤
│ DataFrame Backends │
│ pandas | polars | arrow | duckdb │
└─────────────────────────────────────────────────┘
Specific Design Questions
-
Parser Strategy:
- Build parsers in Python (existing ecosystem) or Rust (performance, portability)?
- Use existing parser libraries (e.g.,
pyparsing,pestin Rust)? - What's the maintenance burden for multiple language grammars?
-
Semantic Mapping:
- Which features don't translate cleanly?
- How to handle semantic mismatches (e.g., Cypher's
OPTIONAL MATCHvs GFQL)? - Error reporting for untranslatable queries?
-
Performance:
- Client-side translation overhead?
- Should we support remote execution (push query to server)?
- Caching/memoization of translated queries?
-
Standard Compliance:
- Should GFQL target ISO GQL compliance?
- Cypher has openCypher standard - align with it?
- Trade-offs of standard conformance vs GFQL's unique features?
-
Rust Port Priorities:
- Core AST execution first, or parsers first?
- Which DataFrame backend for embedded Rust? (Arrow, Polars?)
- WebAssembly target for browser-based GFQL?
Deliverables
Phase 1: Research (2-3 weeks)
- [ ] Survey existing translation tools (e.g., openCypher parsers)
- [ ] Document semantic mapping tables for each language
- [ ] Identify feature gaps and untranslatable patterns
- [ ] Benchmark translation performance overhead
- [ ] Prototype: Simple Cypher → GFQL translator for common patterns
Phase 2: Architecture Design (1-2 weeks)
- [ ] Define translation layer architecture
- [ ] Design AST schema extensions (if needed)
- [ ] Plan for embedded runtime (Rust port strategy)
- [ ] Identify parser library candidates (Python + Rust)
- [ ] Define interop API surface
Phase 3: Prototype (3-4 weeks)
- [ ] Implement basic Cypher → GFQL translator
- [ ] Test with real-world Cypher queries
- [ ] Document translation fidelity
- [ ] Evaluate performance
- [ ] Gather user feedback
Phase 4: Embedded Runtime Exploration (Future)
- [ ] Rust AST execution prototype
- [ ] DataFrame backend selection (Polars?)
- [ ] WASM compilation feasibility
- [ ] Performance benchmarks vs Python
Success Criteria
- Clear understanding of translation fidelity for each language
- Documented semantic mapping tables
- Proof-of-concept translator for at least one language (Cypher)
- Architecture design that supports future Rust port
- Performance benchmarks showing acceptable overhead
Related Issues
- #722 - GFQL path support (Cypher has native path syntax)
- #755 - Mark mode (related to traversal semantics)
- #700 - Auto-generate JSON Schema for GFQL wire protocol
- #651 - GFQL remote predicates fail with 'id' column
- #696 - Multi-label node matching predicates
Open Questions
- Should we prioritize inbound translation (Cypher → GFQL) or outbound (GFQL → Cypher for remote execution)?
- Is BOLT protocol support worth the complexity vs text-based Cypher parsing?
- Should we target GQL standard compliance as a strategic goal?
- What's the timeline for Rust port? Should we design for it now or incrementally?
- Should embedded runtime use existing DataFrame libraries (Polars) or custom IR?
Next Steps
- Create feature comparison matrix (GFQL vs Cypher/GQL/GSQL/Gremlin)
- Document semantic equivalence mappings
- Survey existing parser tools
- Prototype Cypher → GFQL translator for common patterns
- Define roadmap based on findings
Priority: P2 - Strategic direction for GFQL evolution
Estimated Effort: 6-10 weeks for full research + prototype