Wenchen Fan
Wenchen Fan
@wbo4958 Can you add comments as I asked in https://github.com/apache/spark/pull/37855/files#r975993118 ?
thanks, merging to master/3.3/3.2!
here you are: https://github.com/apache/spark/commit/0c94e47aecab0a8c346e1a004686d1496a9f2b07
To close the loop: `CACHE TABLE abc AS SELECT id from range(0,1)` should be sufficient. If it fails with view already exists, we can either rerun it with a different...
thanks, merging to master!
shall we change `unrequiredChildIndex: Seq[Int]` to `requiredChildren: Seq[Attribute]`? then column position is not an issue anymore.
@Kimahriman feel free to pick up this if you have an idea about how to fix it.
will we reuse the broadcast data after the query completes? e.g. call `df.collect()` multiple times.
I think it's true for SQL queries, but not sure about dataframe queries, which keeps the physical plan as a lazy val and users can repeatedly execute the same physical...
We should put more high-level information: what's the corresponding parquet type for string with collation? and how do we fix the parquet max/min column stats?