Wenchen Fan comments

Results 245 comments of


                                            Wenchen Fan

[SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

@wbo4958 Can you add comments as I asked in https://github.com/apache/spark/pull/37855/files#r975993118 ?

[SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

thanks, merging to master/3.3/3.2!

[SPARK-39950][SQL] It's unnecessary to materialize BroadcastQueryStage firstly, because the BroadcastQueryStage does not timeout in AQE.

here you are: https://github.com/apache/spark/commit/0c94e47aecab0a8c346e1a004686d1496a9f2b07

[SPARK-39930][SQL] Introduce Cache Hints

To close the loop: `CACHE TABLE abc AS SELECT id from range(0,1)` should be sufficient. If it fails with view already exists, we can either rerun it with a different...

[SPARK-39876][SQL] Add UNPIVOT to SQL syntax

thanks, merging to master!

[SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

shall we change `unrequiredChildIndex: Seq[Int]` to `requiredChildren: Seq[Attribute]`? then column position is not an issue anymore.

[SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

@Kimahriman feel free to pick up this if you have an idea about how to fix it.

[SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends

will we reuse the broadcast data after the query completes? e.g. call `df.collect()` multiple times.

[SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends

I think it's true for SQL queries, but not sure about dataframe queries, which keeps the physical plan as a lazy val and users can repeatedly execute the same physical...

[SPARK-47009][SQL] Enable create table support for collation

We should put more high-level information: what's the corresponding parquet type for string with collation? and how do we fix the parquet max/min column stats?