Wenchen Fan

Results 245 comments of Wenchen Fan

oh this is a hard one. The cost of predicates is hard to estimate, and also the benefit as we need to estimate the selectivity and the input data volume....

I've been thinking hard about it. Filter pushdown should always be beneficial if we don't duplicate expressions, and the new `With` expression can avoid expression duplication. So my proposal is:...

Instead of showing an example query, can you define the general form of joined aggregates that can be merged?

For merging `func1(...) ... WHERE cond1` and `func2(...) ... WHERE cond2`, we got ``` func1(...) FILTER cond1, func2(...) FILTER cond2 ... WHERE cond1 OR cond2 ``` Assuming there is no...

@peter-toth I agree that the extra project can help if we decided to merge. However, the plan pattern becomes complicated. Without the extra project, the merged aggregate is still `Aggregate...

I'm not sure if we can reuse https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L235-L255 directly. If the `Project` is not generated by this optimization, it might contain other expensive expressions and we should stop merging. Maybe...

I think @peter-toth did something similar before, can you share some ideas @peter-toth ?

@peter-toth can you retrigger the tests? The pyspark failures may be flaky.

thanks, merging to master/3.4 (as it fixes a bug in planned write)!