spark
spark copied to clipboard
[SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate
What changes were proposed in this pull request?
The current implementation of the replaceWithAliases
method in NestedColumnAliasing
replaces the children of the current plan with mapped aliases. When the current plan is a Generate
-- which keeps a set of indices unrequiredChildIndex
for unsed child outputs -- and if the replaced child happens to be in this set, because the unrequiredChildIndex
is not updated for this Generate
plan, the original unrequiredChildIndex
will point to the new children, leading to exceptions/errors.
Why are the changes needed?
See the description above.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
A unit test is added to SparkPlanSuite
Can one of the admins verify this patch?
Any comments?
Hi @dongjoon-hyun , is it possible to find someone to view the change (it's a relatively small one :)? Thanks.
I'll take a look today or this weekend. Thanks @dongjoon-hyun
edit: don't find a time in the weekend. I will look at this soon.
Thank you!
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
I removed State
tag. Could you rebase this PR to the master branch, @jiaji-wu ?
shall we change unrequiredChildIndex: Seq[Int]
to requiredChildren: Seq[Attribute]
? then column position is not an issue anymore.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
I think we just hit this after upgrading to Spark 3.4. Some change is now triggering the GeneratorNestedColumnAliasing where it wasn't happening before, and as a result we're getting corrupted generators because of the stale unrequiredChildIndex
value.
Thank you for reporting, @Kimahriman .
BTW, may I ask what is your previous successful Spark version, @Kimahriman ?
3.3.2
@Kimahriman feel free to pick up this if you have an idea about how to fix it.
I think I'm also hitting another possibly unrelated bug with the GeneratorNestedColumnAliasing
that I'm still trying to track down, and I still haven't figured out what changed between 3.3.2 and 3.4.0 that would cause GeneratorNestedColumnAliasing
to trigger now for the exact same plan when it wasn't in 3.3.2.