Add Substrait roundtrip support for `RecursiveQuery` and recursive CTE scans
Which issue does this PR close?
- Closes #16274.
Rationale for this change
Substrait roundtrip mode currently fails for plans that include RecursiveQuery, resulting in not_impl_err!("Unsupported plan type: RecursiveQuery") during SQL logic tests. This prevents recursive CTE queries from being roundtripped through Substrait, causing multiple cte.slt cases to fail and reducing confidence in Substrait interoperability.
This PR adds end-to-end support for serializing and deserializing RecursiveQuery logical plans and their associated recursive work-table scans. This unblocks the failing SQL logic tests and improves parity between the DataFusion logical plan representation and its Substrait encoding.
What changes are included in this PR?
-
New
logical_plan::recursivehelper module-
Introduces
RECURSIVE_QUERY_TYPE_URLandRECURSIVE_SCAN_TYPE_URLto tag recursive structures in Substrait extensions. -
Defines small prost messages for:
RecursiveQueryDetail { name, is_distinct }used to carryRecursiveQuerymetadata.RecursiveScanDetail { name }used to identify recursive work-table scans.
-
Provides helpers to encode/decode these messages:
encode_recursive_query_detail/decode_recursive_query_detail.encode_recursive_scan_detail/decode_recursive_scan_detail.
-
Adds validation so that empty names and malformed payloads are reported as Substrait errors.
-
-
Producer-side support for
RecursiveQuery-
Adds
logical_plan/producer/rel/recursive_query_rel.rsimplementing:-
from_recursive_query, which serializesLogicalPlan::RecursiveQueryintoExtensionMultiRelwith two inputs:inputs[0]:static_term.inputs[1]:recursive_term.
-
Encodes
{ name, is_distinct }into thedetailfield usingRECURSIVE_QUERY_TYPE_URL.
-
-
Wires this into the generic Substrait producer:
- Extends
SubstraitProducerwithhandle_recursive_query. - Updates
to_substrait_relto delegateLogicalPlan::RecursiveQuerytohandle_recursive_queryinstead of returningnot_impl_err!.
- Extends
-
-
Consumer-side support for
RecursiveQuery-
Adds
logical_plan/consumer/rel/recursive_query_rel.rsimplementing:-
from_recursive_query_rel, which reconstructsLogicalPlan::RecursiveQueryfromExtensionMultiRel. -
Validates that:
- The extension has exactly two inputs.
- The
detailfield is present and decodes successfully.
-
Rebuilds
RecursiveQuery { name, is_distinct, static_term, recursive_term }.
-
-
Integrates with
DefaultSubstraitConsumer:- Detects
RECURSIVE_QUERY_TYPE_URLinExtensionMultiReland routes tofrom_recursive_query_rel. - Falls back to the existing extension handling for other type URLs.
- Detects
-
-
Support for recursive CTE work-table scans
-
Producer (TableScan → ReadRel)
-
Detects
CteWorkTable-backed scans viaDefaultTableSource. -
Adds a helper
recursive_scan_name(&TableScan) -> Option<String>that:- Downcasts to
DefaultTableSourceand then toCteWorkTable. - Confirms that the scan’s table name matches the work-table name.
- Downcasts to
-
When a
CteWorkTableis detected, setsReadRel.advanced_extensionwith:type_url = RECURSIVE_SCAN_TYPE_URL.value = encode_recursive_scan_detail(name).
-
-
Consumer (ReadRel → LogicalPlan::TableScan)
- Parses
ReadRel.advanced_extensionand, whentype_url == RECURSIVE_SCAN_TYPE_URL, decodes the recursive scan detail. - Verifies the recursive scan name matches the
TableReferencetable name and returns a Substrait error if they differ. - Uses a
CteWorkTablewrapped in aTableSourceinstead of resolving a regular catalog table for recursive scans. - Falls back to the existing
resolve_table_reflogic for non-recursive scans.
- Parses
-
-
Refactoring in
ReadRelconsumer- Replaces the earlier
read_with_schemahelper with a more flexibleread_with_sourcehelper that accepts aTableSource(either a catalog table or aCteWorkTable). - Ensures that schema compatibility checks are still performed after building the scan plan.
- Replaces the earlier
-
Tests
-
Adds unit tests for producer-side recursive scan encoding:
-
from_table_scan_sets_advanced_extension_for_cte_work_tableensures that:- A
TableScanover aCteWorkTableusesRECURSIVE_SCAN_TYPE_URLinadvanced_extension. - The encoded payload decodes back to the correct table name.
- A
-
-
Adds roundtrip and error-coverage tests for recursive queries and scan details in
roundtrip_logical_plan.rs:roundtrip_recursive_queryverifies that aRecursiveQueryroundtrips through Substrait and back, preserving the name andis_distinctflag.serialize_recursive_query_with_empty_name_errorschecks that encoding fails with a clear error when theRecursiveQueryname is empty.decode_recursive_query_detail_malformed_bytes_errorsensures malformed bytes produce a descriptive Substrait error.decode_recursive_scan_detail_malformed_bytes_errorssimilarly validates error handling for malformed recursive scan detail.roundtrip_recursive_query_distinctconfirms thatis_distinct = trueis preserved across the roundtrip.roundtrip_recursive_query_preserves_child_plansdoes a structural sanity check that the main characteristics of the child plans (projections, filters, table scans) survive the roundtrip.roundtrip_recursive_query_with_work_table_scan_executesruns a real recursive CTE (balances example) through Substrait roundtrip and executes it, asserting non-empty results.
-
Are these changes tested?
Before
> cargo test --test sqllogictests -- --substrait-round-trip cte.slt:175
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.65s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-79fc77e66b850bad)
Completed 1 test files in 0 seconds External error: 1 errors in file ...
1. query failed: DataFusion error: This feature is not implemented: Unsupported plan type: RecursiveQuery....
After
❯ cargo test --test sqllogictests -- --substrait-round-trip cte.slt:175
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.78s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-79fc77e66b850bad)
Completed 1 test files in 0 seconds
Yes.
-
New and updated tests have been added to cover:
RecursiveQuerySubstrait serialization and deserialization.- Encoding/decoding of
RecursiveQueryDetailandRecursiveScanDetail, including malformed-byte and empty-name error paths. - Correct tagging of
CteWorkTablescans viaadvanced_extension. - End-to-end execution of a recursive CTE query after Substrait roundtrip.
-
Existing Substrait roundtrip tests continue to run and provide regression coverage for non-recursive plans.
Are there any user-facing changes?
-
Behavioral improvement (no breaking API changes):
- Substrait roundtrip mode now supports logical plans that contain
RecursiveQueryand recursive CTE work-table scans. Previously, such plans failed withnot_impl_err!("Unsupported plan type: RecursiveQuery"). - Users who rely on Substrait serialization/deserialization for queries with recursive CTEs should now see these queries roundtrip and execute successfully.
- Substrait roundtrip mode now supports logical plans that contain
-
No public Rust API signatures are changed, and no configuration flags or SQL syntax are modified.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.
hi @gabotechs Can you take a look?
Sure! will take a look soon