opteryx
opteryx copied to clipboard
Add Substrait import/export support for query plans
Thank you for opening a Pull Request!
We appreciate your contribution to Opteryx. Your time and effort make a difference, and we're excited to review your changes. To help ensure a smooth review process, please check the following:
Checklist for a Successful PR
- [x] Start the conversation: If you haven't already, raise a bug/feature request or start a discussion. This ensures alignment on the change and approach.
- [x] Run the tests: Confirm that all tests pass without errors.
- [x] Maintain code coverage: If you've added or modified source code ensure new tests are added to the test suite.
- [x] Update documentation and tests (if applicable): If your changes impact functionality, make sure the relevant docs and test cases are updated.
Fixes: #<issue_number_goes_here>
Please replace <issue_number_goes_here> with the corresponding issue number.
Description
Implements bidirectional conversion between Opteryx logical plans and Substrait format, enabling interoperability with other query engines.
Implementation
New module: opteryx/planner/substrait/
exporter.py- Maps Opteryx logical plan nodes to Substrait relations (ReadRel, ProjectRel, FilterRel, JoinRel, AggregateRel, FetchRel, SortRel)importer.py- Inverse mapping from Substrait relations to Opteryx logical plan- Expression handling for literals, field references, and functions
- Type mapping between Orso and Substrait type systems
- Supports both protobuf binary and JSON serialization
Optional dependency: Added substrait and protobuf to pyproject.toml as extras
Usage
from opteryx.planner.substrait import export_to_substrait, import_from_substrait
# Export Opteryx logical plan to Substrait
substrait_bytes = export_to_substrait(logical_plan, output_format="proto")
substrait_json = export_to_substrait(logical_plan, output_format="json")
# Import Substrait plan to Opteryx
logical_plan = import_from_substrait(substrait_bytes, input_format="proto")
# Round-trip conversion preserves plan structure
imported_plan = import_from_substrait(
export_to_substrait(original_plan)
)
Testing & Documentation
- Integration tests verify protobuf/JSON serialization without requiring compilation
- Example script demonstrates creating, serializing, and converting plans
- User documentation in
docs/SUBSTRAIT.md - Developer documentation in module README
Limitations
- Function mapping uses placeholder references (extension registry needed for full support)
- Window functions, CTEs, and set operations not yet implemented
- Type precision may vary for complex nested types
Thank you for contributing to Opteryx! 🎉
Original prompt
This section details on the original issue you should resolve
<issue_title>Import and Export substrait plans</issue_title> <issue_description>Substrate as a common language for query plans appears in a few query engines. It may be pragmatic to provide support for substrate plans, I think this would be close to the boundary between the python-heavy planning phase and the cython-heavy execution phase, although I expect some optimization of the substrate plan would still be needed (e.g. based on cost).</issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes mabel-dev/opteryx#2925
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.