opteryx icon indicating copy to clipboard operation
opteryx copied to clipboard

Add Substrait import/export support for query plans

Open Copilot opened this issue 3 weeks ago • 0 comments

Thank you for opening a Pull Request!

We appreciate your contribution to Opteryx. Your time and effort make a difference, and we're excited to review your changes. To help ensure a smooth review process, please check the following:

Checklist for a Successful PR

  • [x] Start the conversation: If you haven't already, raise a bug/feature request or start a discussion. This ensures alignment on the change and approach.
  • [x] Run the tests: Confirm that all tests pass without errors.
  • [x] Maintain code coverage: If you've added or modified source code ensure new tests are added to the test suite.
  • [x] Update documentation and tests (if applicable): If your changes impact functionality, make sure the relevant docs and test cases are updated.

Fixes: #<issue_number_goes_here>

Please replace <issue_number_goes_here> with the corresponding issue number.


Description

Implements bidirectional conversion between Opteryx logical plans and Substrait format, enabling interoperability with other query engines.

Implementation

New module: opteryx/planner/substrait/

  • exporter.py - Maps Opteryx logical plan nodes to Substrait relations (ReadRel, ProjectRel, FilterRel, JoinRel, AggregateRel, FetchRel, SortRel)
  • importer.py - Inverse mapping from Substrait relations to Opteryx logical plan
  • Expression handling for literals, field references, and functions
  • Type mapping between Orso and Substrait type systems
  • Supports both protobuf binary and JSON serialization

Optional dependency: Added substrait and protobuf to pyproject.toml as extras

Usage

from opteryx.planner.substrait import export_to_substrait, import_from_substrait

# Export Opteryx logical plan to Substrait
substrait_bytes = export_to_substrait(logical_plan, output_format="proto")
substrait_json = export_to_substrait(logical_plan, output_format="json")

# Import Substrait plan to Opteryx
logical_plan = import_from_substrait(substrait_bytes, input_format="proto")

# Round-trip conversion preserves plan structure
imported_plan = import_from_substrait(
    export_to_substrait(original_plan)
)

Testing & Documentation

  • Integration tests verify protobuf/JSON serialization without requiring compilation
  • Example script demonstrates creating, serializing, and converting plans
  • User documentation in docs/SUBSTRAIT.md
  • Developer documentation in module README

Limitations

  • Function mapping uses placeholder references (extension registry needed for full support)
  • Window functions, CTEs, and set operations not yet implemented
  • Type precision may vary for complex nested types

Thank you for contributing to Opteryx! 🎉

Original prompt

This section details on the original issue you should resolve

<issue_title>Import and Export substrait plans</issue_title> <issue_description>Substrate as a common language for query plans appears in a few query engines. It may be pragmatic to provide support for substrate plans, I think this would be close to the boundary between the python-heavy planning phase and the cython-heavy execution phase, although I expect some optimization of the substrate plan would still be needed (e.g. based on cost).</issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes mabel-dev/opteryx#2925

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot avatar Nov 15 '25 19:11 Copilot