[SPARK-52462] [SQL] Enforce type coercion before children output deduplication in Union
What changes were proposed in this pull request?
Right now, query the following query produces plans that are not consistent over different underlying table providers. Query:
SELECT col1, col2, col3, NULLIF('','') AS col4
FROM table
UNION ALL
SELECT col2, col2, null AS col3, col4
FROM table;
This happens because of rule ordering:
- Sometimes:
WidenSetOperationTypes-> ... ->ResolveReferences(deduplication ofUnionchildren outputs) - Sometimes:
ResolveReferences(deduplication ofUnionchildren outputs) -> ... ->WidenSetOperationTypes
In this issue I propose that we align those two by enforcing type coercion to happen before deduplication.
Why are the changes needed?
To make UNION with different underlying table providers producing consistent plans.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added tests + existing ones.
Was this patch authored or co-authored using generative AI tooling?
No.
@mihailoale-db can you say more about how your example query gets a different type coercion result with different rule order? Let's describe "not consistent" clearly here.
@cloud-fan Some third party data sources may add custom analyzer rules that will change the rule order here. Delta Lake is an example. Let me mention that in the description. Thanks!
@cloud-fan all the tests passed. PTAL when you have time. Thanks!
thanks, merging to master!