Review validation design consistency across GFQL operations

Open lmeyerov opened this issue 2 months ago • 1 comments

Review validation design consistency across GFQL operations

Summary

There's an inconsistency in how validation is controlled across different GFQL execution paths:

AST construction has validate=True parameter (checks structure + internal column patterns)
Direct method calls (g.get_degrees(), g.filter_nodes_by_dict()) ALWAYS validate with no opt-out
GFQL execution (.gfql()) has no validation control parameter

Current Behavior

AST Objects - Controllable Validation ✅

# Can disable validation
n({'__gfql_internal__': 1}, validate=False)  # No error

# Default validation enabled
n({'__gfql_internal__': 1})  # ValueError

Direct Method Calls - Always Validate ❌

# No way to disable validation
g.get_degrees(col='__gfql_degree__')  # Always raises ValueError
g.filter_nodes_by_dict({'__gfql_col__': 1})  # Always raises ValueError

GFQL Call Operations - Always Validate ❌

# Validation happens inside the call, not controllable
g.gfql([
    call('get_degrees', {'col': '__gfql_degree__'})  # Always raises ValueError
])

Design Question

Should validation be controllable consistently across all execution paths?

Option A: Add `validate` parameter everywhere

# Direct calls
g.get_degrees(col='degree', validate=True)  # default
g.filter_nodes_by_dict(filter_dict, validate=True)  # default

# GFQL execution - inherit from top level
g.gfql([n(), call('get_degrees')], validate=True)  # default, flows down

# AST construction - already exists
n({'col': 1}, validate=True)  # already works

Pros:

Consistent API across all execution paths
Users can opt out for advanced use cases
Top-level gfql() validation flows down to all operations

Cons:

Breaking API change (new parameters)
More complex - validation state must flow through execution context
call() operations shouldn't allow individual validate override (should inherit from gfql)

Option B: Keep current behavior (always validate at runtime)

# AST construction - controllable (parse-time validation)
n({'col': 1}, validate=False)  # Can disable

# Runtime operations - always validate (runtime validation)  
g.get_degrees()  # Always validates
g.filter_nodes_by_dict(...)  # Always validates

Pros:

Simpler - no parameter threading needed
Clear separation: parse-time vs runtime validation
No breaking changes

Cons:

Inconsistent API (validate parameter only on AST objects)
No way for advanced users to bypass runtime validation
gfql() can't control validation of operations it calls

Option C: Document as two validation layers

# Layer 1: Parse-time validation (AST structure + patterns)
ast_obj = n({'__gfql_col__': 1}, validate=False)  # controllable

# Layer 2: Runtime validation (schema checks, always on)
g.get_degrees()  # always validates
g.filter_nodes_by_dict(...)  # always validates
g.gfql([ast_obj])  # re-validates at runtime regardless of AST validate flag

Pros:

No breaking changes
Clear separation of concerns
Simpler implementation

Cons:

Still inconsistent
Can't disable runtime validation

Affected Code

Files with validation:

graphistry/compute/ComputeMixin.py - get_degrees(), get_indegrees(), get_outdegrees(), get_topological_levels()
graphistry/compute/filter_by_dict.py - filter_nodes_by_dict(), filter_edges_by_dict()
graphistry/compute/ast.py - AST _validate_fields() methods

Validation functions:

graphistry/compute/gfql/identifiers.py:
- validate_column_name() - for output parameters
- validate_column_references() - for filter dict keys

Related Context

PR #788 - Added internal column validation for __gfql_*__ pattern
These validations prevent filtering on temporary internal columns
Internal columns won't work with gfql_remote() (serialization issue)

Questions

Should validation be consistently controllable across all execution paths?
If yes, should call() operations inherit validation from parent .gfql() call?
Should we distinguish between parse-time validation (AST structure) and runtime validation (schema checks)?
Is there a use case for disabling runtime validation?

Recommendation

Probably Option C (document two layers) unless there's a strong use case for disabling runtime validation. The current behavior is safe and prevents confusing errors downstream.

If we go with Option A, the design should be:

gfql(query, validate=True) - controls validation for entire execution
Validation flag flows through ExecutionContext
Individual operations inherit from context (no per-operation override)
call() operations respect inherited validation (can't override)

cc: @lmeyerov

Oct 14 '25 04:10 lmeyerov

basically we should allow both:

checked runtime execution
query-construction time validation, allowing fail-fast program validation + fast runtime

Oct 14 '25 04:10 lmeyerov

Review validation design consistency across GFQL operations

Review validation design consistency across GFQL operations

Summary

Current Behavior

AST Objects - Controllable Validation ✅

Direct Method Calls - Always Validate ❌

GFQL Call Operations - Always Validate ❌

Design Question

Option A: Add validate parameter everywhere

Option B: Keep current behavior (always validate at runtime)

Option C: Document as two validation layers

Affected Code

Related Context

Questions

Recommendation

Option A: Add `validate` parameter everywhere