pygraphistry
pygraphistry copied to clipboard
Review validation design consistency across GFQL operations
Review validation design consistency across GFQL operations
Summary
There's an inconsistency in how validation is controlled across different GFQL execution paths:
- AST construction has
validate=Trueparameter (checks structure + internal column patterns) - Direct method calls (
g.get_degrees(),g.filter_nodes_by_dict()) ALWAYS validate with no opt-out - GFQL execution (
.gfql()) has no validation control parameter
Current Behavior
AST Objects - Controllable Validation ✅
# Can disable validation
n({'__gfql_internal__': 1}, validate=False) # No error
# Default validation enabled
n({'__gfql_internal__': 1}) # ValueError
Direct Method Calls - Always Validate ❌
# No way to disable validation
g.get_degrees(col='__gfql_degree__') # Always raises ValueError
g.filter_nodes_by_dict({'__gfql_col__': 1}) # Always raises ValueError
GFQL Call Operations - Always Validate ❌
# Validation happens inside the call, not controllable
g.gfql([
call('get_degrees', {'col': '__gfql_degree__'}) # Always raises ValueError
])
Design Question
Should validation be controllable consistently across all execution paths?
Option A: Add validate parameter everywhere
# Direct calls
g.get_degrees(col='degree', validate=True) # default
g.filter_nodes_by_dict(filter_dict, validate=True) # default
# GFQL execution - inherit from top level
g.gfql([n(), call('get_degrees')], validate=True) # default, flows down
# AST construction - already exists
n({'col': 1}, validate=True) # already works
Pros:
- Consistent API across all execution paths
- Users can opt out for advanced use cases
- Top-level gfql() validation flows down to all operations
Cons:
- Breaking API change (new parameters)
- More complex - validation state must flow through execution context
call()operations shouldn't allow individual validate override (should inherit from gfql)
Option B: Keep current behavior (always validate at runtime)
# AST construction - controllable (parse-time validation)
n({'col': 1}, validate=False) # Can disable
# Runtime operations - always validate (runtime validation)
g.get_degrees() # Always validates
g.filter_nodes_by_dict(...) # Always validates
Pros:
- Simpler - no parameter threading needed
- Clear separation: parse-time vs runtime validation
- No breaking changes
Cons:
- Inconsistent API (validate parameter only on AST objects)
- No way for advanced users to bypass runtime validation
- gfql() can't control validation of operations it calls
Option C: Document as two validation layers
# Layer 1: Parse-time validation (AST structure + patterns)
ast_obj = n({'__gfql_col__': 1}, validate=False) # controllable
# Layer 2: Runtime validation (schema checks, always on)
g.get_degrees() # always validates
g.filter_nodes_by_dict(...) # always validates
g.gfql([ast_obj]) # re-validates at runtime regardless of AST validate flag
Pros:
- No breaking changes
- Clear separation of concerns
- Simpler implementation
Cons:
- Still inconsistent
- Can't disable runtime validation
Affected Code
Files with validation:
graphistry/compute/ComputeMixin.py-get_degrees(),get_indegrees(),get_outdegrees(),get_topological_levels()graphistry/compute/filter_by_dict.py-filter_nodes_by_dict(),filter_edges_by_dict()graphistry/compute/ast.py- AST_validate_fields()methods
Validation functions:
graphistry/compute/gfql/identifiers.py:validate_column_name()- for output parametersvalidate_column_references()- for filter dict keys
Related Context
- PR #788 - Added internal column validation for
__gfql_*__pattern - These validations prevent filtering on temporary internal columns
- Internal columns won't work with
gfql_remote()(serialization issue)
Questions
- Should validation be consistently controllable across all execution paths?
- If yes, should
call()operations inherit validation from parent.gfql()call? - Should we distinguish between parse-time validation (AST structure) and runtime validation (schema checks)?
- Is there a use case for disabling runtime validation?
Recommendation
Probably Option C (document two layers) unless there's a strong use case for disabling runtime validation. The current behavior is safe and prevents confusing errors downstream.
If we go with Option A, the design should be:
gfql(query, validate=True)- controls validation for entire execution- Validation flag flows through
ExecutionContext - Individual operations inherit from context (no per-operation override)
call()operations respect inherited validation (can't override)
cc: @lmeyerov
basically we should allow both:
- checked runtime execution
- query-construction time validation, allowing fail-fast program validation + fast runtime