Wenchen Fan

Results 245 comments of Wenchen Fan

@wbo4958 Can you add comments as I asked in https://github.com/apache/spark/pull/37855/files#r975993118 ?

here you are: https://github.com/apache/spark/commit/0c94e47aecab0a8c346e1a004686d1496a9f2b07

To close the loop: `CACHE TABLE abc AS SELECT id from range(0,1)` should be sufficient. If it fails with view already exists, we can either rerun it with a different...

shall we change `unrequiredChildIndex: Seq[Int]` to `requiredChildren: Seq[Attribute]`? then column position is not an issue anymore.

@Kimahriman feel free to pick up this if you have an idea about how to fix it.

will we reuse the broadcast data after the query completes? e.g. call `df.collect()` multiple times.

I think it's true for SQL queries, but not sure about dataframe queries, which keeps the physical plan as a lazy val and users can repeatedly execute the same physical...

We should put more high-level information: what's the corresponding parquet type for string with collation? and how do we fix the parquet max/min column stats?