spark
spark copied to clipboard
[SPARK-45789][SQL] Support DESCRIBE TABLE for clustering columns
What changes were proposed in this pull request?
This PR proposes to add clustering column info as the output of DESCRIBE TABLE.
Why are the changes needed?
Currently, it's not easy to retrieve clustering column info; you can do it via catalog APIs.
Does this PR introduce any user-facing change?
Yes. Now, when you run DESCRIBE TABLE on clustered tables, you will see the "Clustering Information" as follows:
CREATE TABLE tbl (col1 STRING, col2 INT) using parquet CLUSTER BY (col1, col2);
DESC tbl;
+------------------------+---------+-------+
|col_name |data_type|comment|
+------------------------+---------+-------+
|col1 |string |NULL |
|col2 |int |NULL |
|# Clustering Information| | |
|# col_name |data_type|comment|
|col1 |string |NULL |
|col2 |int |NULL |
+------------------------+---------+-------+
How was this patch tested?
Added new unit tests.
Was this patch authored or co-authored using generative AI tooling?
No
cc @cloud-fan
thanks, merging to master!