spark
spark copied to clipboard
[WIP][SPARK-49816][SQL][FOLLOW-UP] Fix conflicting CTE ids
What changes were proposed in this pull request?
This is a follow-up PR that reverts https://github.com/apache/spark/pull/48284 in the first commit and offers a new way to deal with the issue of conflicting CTE ids. This PR generates new CTE ids for conflicting ones and retains the pre https://github.com/apache/spark/pull/48284 reference counting logic. See the the details in https://github.com/apache/spark/pull/48284#issuecomment-2391887954.
Why are the changes needed?
The previous PR relied on the fact that nested, connected CTEs get flattened during resolution (see an example why it worked in a some cases: https://github.com/apache/spark/pull/48284#issuecomment-2386982325), and so the remaining WithCTE
nodes in the plan are surely self-contained. If we can't rely on that fact then we need a more robust solution.
Does this PR introduce any user-facing change?
No, because currently some issues (https://github.com/apache/spark/pull/48284#issuecomment-2389011706) can come up with manually assembed plans only. (I think we should not consider an issue user-facing if it can't be reproduced with public APIs.)
How was this patch tested?
Existing and new UTs.
Was this patch authored or co-authored using generative AI tooling?
No.