spark [WIP][SPARK-49816][SQL][FOLLOW-UP] Fix conflicting CTE ids

[WIP][SPARK-49816][SQL][FOLLOW-UP] Fix conflicting CTE ids

Open peter-toth opened this issue 4 months ago • 1 comments

What changes were proposed in this pull request?

This is a follow-up PR that reverts https://github.com/apache/spark/pull/48284 in the first commit and offers a new way to deal with the issue of conflicting CTE ids. This PR generates new CTE ids for conflicting ones and retains the pre https://github.com/apache/spark/pull/48284 reference counting logic. See the the details in https://github.com/apache/spark/pull/48284#issuecomment-2391887954.

Why are the changes needed?

The previous PR relied on the fact that nested, connected CTEs get flattened during resolution (see an example why it worked in a some cases: https://github.com/apache/spark/pull/48284#issuecomment-2386982325), and so the remaining WithCTE nodes in the plan are surely self-contained. If we can't rely on that fact then we need a more robust solution.

Does this PR introduce any user-facing change?

No, because currently some issues (https://github.com/apache/spark/pull/48284#issuecomment-2389011706) can come up with manually assembed plans only. (I think we should not consider an issue user-facing if it can't be reproduced with public APIs.)

How was this patch tested?

Existing and new UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

Oct 12 '24 08:10 peter-toth

spark spark copied to clipboard

[WIP][SPARK-49816][SQL][FOLLOW-UP] Fix conflicting CTE ids

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

spark
spark copied to clipboard