Results 118 comments of Peter Toth

I extracted the first commit of this PR, that just moves `MergeScalarSubqueries` from `spark-catalyst` to `spark-sql`, to https://github.com/apache/spark/pull/40932 to make the actual change of this PR clearer once that PR...

I've updated this PR, the latest version contains the discussed changes from theads of https://github.com/apache/spark/pull/42223: - I removed the physical scan equality check to make this PR simpler, - merge...

> Hey, is this part of generalized subquery fusion? https://www.usenix.org/conference/osdi20/presentation/sarthi No, this PR is not based on the above paper but our goals seems to be similar. This PR merges...

@cloud-fan, @beliefer do you think we can move forward with this PR?

Hi @beliefer, This is a nice optimization and awsome performance improvement to Q28, but I'm not sure the implementation is done the right way: - You probably know that `MergeScalarSubqueries`...

> > BTW, in my #37630 I used a different heuristics to disable merging of aggregates with > different filter conditions. If the conditions contain any partitioning or bucketing columns...

@beliefer, I've updated my PR with comments to ellaborate on how it handles `Filter`s: https://github.com/apache/spark/pull/37630/files#diff-3d3aa853c51c01216a7f7307219544a48e8c14fabe1817850e58739208bc406aR351-R524

> ``` > case (CHECKING, FileSourceScanPlan(_, newScan), FileSourceScanPlan(_, cachedScan)) => > val (newScanToCompare, cachedScanToCompare) = > if (conf.getConf(SQLConf.PLAN_MERGE_IGNORE_PUSHED_PUSHED_DATA_FILTERS)) { > (newScan.copy(dataFilters = Seq.empty), cachedScan.copy(dataFilters = Seq.empty)) > } else {...

> > But my point is that it doesn't matter how the filter looks like (is it an OR condition or not). I enabled merging if only > > FileSourceScanExec.dataFilters...

> I don't think putting the predicates in a Project helps as the problem is from scan prunning. Sorry, my first comment about the extra project (https://github.com/apache/spark/pull/42223#discussion_r1283307075) was confusing. I...