spark
spark copied to clipboard
[SPARK-39876][SQL] Add UNPIVOT to SQL syntax
What changes were proposed in this pull request?
This adds UNPIVOT clause to SQL syntax. It follows the same syntax as BigQuery, T-SQL, Oracle:
FROM ... [ unpivot_clause ]
unpivot_clause:
UNPIVOT [ { INCLUDE | EXCLUDE } NULLS ] (
{ single_value_column_unpivot | multi_value_column_unpivot }
) [[AS] alias]
single_value_column_unpivot:
values_column
FOR name_column
IN (unpivot_column [, ...])
multi_value_column_unpivot:
(values_column [, ...])
FOR name_column
IN ((unpivot_column [, ...]) [[AS] alias] [, ...])
unpivotColumn:
namedExpression
For example:
CREATE TABLE sales_quarterly (year INT, q1 INT, q2 INT, q3 INT, q4 INT);
INSERT INTO sales_quarterly VALUES
(2020, null, 1000, 2000, 2500),
(2021, 2250, 3200, 4200, 5900),
(2022, 4200, 3100, null, null);
SELECT * FROM sales_quarterly
UNPIVOT (
sales FOR quarter IN (q1, q2, q3, q4)
);
SELECT up.* FROM sales_quarterly
UNPIVOT EXCLUDE NULLS (
sales FOR quarter IN (q1 AS `Q1`, q2 AS `Q2`, q3 AS `Q3`, q4 AS `Q4`)
) AS `up`;
SELECT * FROM sales_quarterly
UNPIVOT EXCLUDE NULLS (
(first_quarter, second_quarter)
FOR half_of_the_year IN (
(q1, q2) AS `H1`,
(q3, q4) AS `H2`
)
);
Why are the changes needed?
To support Dataset.unpivot
in SQL queries.
Does this PR introduce any user-facing change?
Yes, adds UNPIVOT
to SQL syntax.
How was this patch tested?
Added end-to-end tests to SQLQueryTestSuite
.
Can one of the admins verify this patch?
cc @maryannxue FYI
@cloud-fan @MaxGekk @HyukjinKwon @gengliangwang @zhengruifeng what do you think?
Can we write down the SQL spec for this syntax in the PR description? To make it easier for people to review the syntax and understand the semantic.
Can we write down the SQL spec for this syntax in the PR description? To make it easier for people to review the syntax and understand the semantic.
I have added the syntax and examples from docs/sql-ref-syntax-qry-select-unpivot.md
to the PR description.
@cloud-fan I have introduced expression UnpivotExpr
to replace the (Seq[NamedExpression], Option[String]])
, which makes code more readable.
But, this introduces the following change in behaviour / deviation from projection behaviour:
spark.range(5).select(struct($"id").as("an")).select($"an.id").show()
"an.id" gets alias "id":
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
+---+
Project(UnresolvedAttribute("an.id"), plan)
--> ResolveReferences rule -->
Project(Alias(GetStructField(an#2.id), "id"), plan)
spark.range(5).select(struct($"id").as("an")).unpivot(Array($"an.id"), Array($"an.id"), "col", "val").show()
before introducing UnpivotExpr
, both ids and values get alias "id" (as in select / Project
):
+---+---+---+
| id|col|val|
+---+---+---+
| 0| id| 0|
| 1| id| 1|
| 2| id| 2|
| 3| id| 3|
| 4| id| 4|
+---+---+---+
after introducing UnpivotExpr
, id "str.id" gets alias "id", value "str.id" does not get an alias and hence gets name "an.id":
+---+-----+---+
| id| col|val|
+---+-----+---+
| 0|an.id| 0|
| 1|an.id| 1|
| 2|an.id| 2|
| 3|an.id| 3|
| 4|an.id| 4|
+---+-----+---+
Now that UnpivotExpr
is the top level expression, inner UnresolvedAttribute
/ GetStructField
does not get an alias:
Unpivot(Seq(UnresolvedAttribute("an.id")), Seq(UnpivotExpr(Seq(UnresolvedAttribute("an.id")), ...)), ..., plan)
--> ResolveReferences -->
Unpivot(Seq(Alias(GetStructField(an#2.id), "id")), Seq(UnpivotExpr(Seq(GetStructField(an#2.id)), ...)), ..., plan)
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1770
CleanupAliases
rule is not the reason, the alias is being removed inside ResolveReferences
.
The only way to get to the old behaviour is a special treatment of UnpivotExpr
in QueryPlan.mapExpressions.recursiveTransform
:
https://github.com/apache/spark/pull/37407/commits/9dd66b78ec817a53325d95900f18198dac9bc3b1#diff-ece55283a94dd23d3c04f8b9d8ae35937ccff67724be690ff30f76e9f8093c6eR211
All green: https://github.com/G-Research/spark/actions/runs/3196590413/jobs/5220447028
Status update may have failed.
Thanks for the excellent code review and guidance!
thanks, merging to master!