spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

Open jiaji-wu opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

The current implementation of the replaceWithAliases method in NestedColumnAliasing replaces the children of the current plan with mapped aliases. When the current plan is a Generate -- which keeps a set of indices unrequiredChildIndex for unsed child outputs -- and if the replaced child happens to be in this set, because the unrequiredChildIndex is not updated for this Generate plan, the original unrequiredChildIndex will point to the new children, leading to exceptions/errors.

Why are the changes needed?

See the description above.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

A unit test is added to SparkPlanSuite

jiaji-wu avatar Jul 30 '22 18:07 jiaji-wu

Can one of the admins verify this patch?

AmplabJenkins avatar Jul 31 '22 08:07 AmplabJenkins

Any comments?

jiaji-wu avatar Aug 02 '22 13:08 jiaji-wu

Hi @dongjoon-hyun , is it possible to find someone to view the change (it's a relatively small one :)? Thanks.

jiaji-wu avatar Aug 06 '22 14:08 jiaji-wu

I'll take a look today or this weekend. Thanks @dongjoon-hyun

edit: don't find a time in the weekend. I will look at this soon.

viirya avatar Sep 17 '22 00:09 viirya

Thank you!

dongjoon-hyun avatar Sep 17 '22 00:09 dongjoon-hyun

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Dec 30 '22 00:12 github-actions[bot]

I removed State tag. Could you rebase this PR to the master branch, @jiaji-wu ?

dongjoon-hyun avatar Dec 30 '22 02:12 dongjoon-hyun

shall we change unrequiredChildIndex: Seq[Int] to requiredChildren: Seq[Attribute]? then column position is not an issue anymore.

cloud-fan avatar Dec 30 '22 12:12 cloud-fan

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Apr 10 '23 00:04 github-actions[bot]

I think we just hit this after upgrading to Spark 3.4. Some change is now triggering the GeneratorNestedColumnAliasing where it wasn't happening before, and as a result we're getting corrupted generators because of the stale unrequiredChildIndex value.

Kimahriman avatar May 22 '23 16:05 Kimahriman

Thank you for reporting, @Kimahriman .

dongjoon-hyun avatar May 22 '23 16:05 dongjoon-hyun

BTW, may I ask what is your previous successful Spark version, @Kimahriman ?

dongjoon-hyun avatar May 22 '23 16:05 dongjoon-hyun

3.3.2

Kimahriman avatar May 22 '23 16:05 Kimahriman

@Kimahriman feel free to pick up this if you have an idea about how to fix it.

cloud-fan avatar May 22 '23 16:05 cloud-fan

I think I'm also hitting another possibly unrelated bug with the GeneratorNestedColumnAliasing that I'm still trying to track down, and I still haven't figured out what changed between 3.3.2 and 3.4.0 that would cause GeneratorNestedColumnAliasing to trigger now for the exact same plan when it wasn't in 3.3.2.

Kimahriman avatar May 22 '23 17:05 Kimahriman