spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39887][SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique

Open peter-toth opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

Keep the output attributes of a Union node's first child in the RemoveRedundantAliases rule to avoid correctness issues.

Why are the changes needed?

To fix the result of the following query:

SELECT a, b AS a FROM (
  SELECT a, a AS b FROM (SELECT a FROM VALUES (1) AS t(a))
  UNION ALL
  SELECT a, b FROM (SELECT a, b FROM VALUES (1, 2) AS t(a, b))
)

Before this PR the query returns the incorrect result:

+---+---+
|  a|  a|
+---+---+
|  1|  1|
|  2|  2|
+---+---+

After this PR it returns the expected result:

+---+---+
|  a|  a|
+---+---+
|  1|  1|
|  1|  2|
+---+---+

Does this PR introduce any user-facing change?

Yes, fixes a correctness issue.

How was this patch tested?

Added new UTs.

peter-toth avatar Jul 28 '22 19:07 peter-toth

@cloud-fan, I've updates this PR that removeRedundantAliases will always keep aliases to avoid duplicate output attributes.

Remaining failures in sparkr/linters tests don't seem related to this change.

peter-toth avatar Aug 02 '22 20:08 peter-toth

Fixes AliasAwareOutputPartitioning and AliasAwareOutputOrdering as testing discovered that ...

Can we create a new PR for this?

cloud-fan avatar Aug 05 '22 16:08 cloud-fan

Fixes AliasAwareOutputPartitioning and AliasAwareOutputOrdering as testing discovered that ...

Can we create a new PR for this?

Sure, I can open a new one for this next week.

peter-toth avatar Aug 05 '22 17:08 peter-toth

cc @sigmod

cloud-fan avatar Aug 09 '22 15:08 cloud-fan

@peter-toth do you know which Spark version starts to have this issue?

cloud-fan avatar Aug 09 '22 15:08 cloud-fan

@peter-toth do you know which Spark version starts to have this issue?

This rule is quite old so I just ran a quick check on the latest 2.4 branch and even that is affected.

peter-toth avatar Aug 10 '22 06:08 peter-toth

The GA failure is unrelated, thanks, merging to master!

Since there are conflicts, can you create backport PRs for 3.3? thanks!

cloud-fan avatar Aug 10 '22 15:08 cloud-fan

The GA failure is unrelated, thanks, merging to master!

Since there are conflicts, can you create backport PRs for 3.3? thanks!

Thanks for the review @cloud-fan! I've opened 3.3 PR here: https://github.com/apache/spark/pull/37472

peter-toth avatar Aug 10 '22 17:08 peter-toth