starrocks
starrocks copied to clipboard
[Optimizer] Add a new rule to prune no-used columns to speed up query performance for aggregate-union-aggregate pattern
Enhancement
For the query below, we can do som optimizations if:
- parent's aggs can be roll-up and rollup from union's aggs;
- parent's gb-keys comes from union's gb-keys.
before
select col1, sum(col3) from -- parent
(
select col1, col2, sum(col3) as col3 from t1 group by col1, col2 -- child
union all
select col1, col2, sum(col3) as col3 from t1 group by col1, col2
) t
group by col1;
after:
select col1, sum(col3) from
(
select col1, sum(col3) as col3 from t1 group by col1
union all
select col1, sum(col3) as col3 from t1 group by col1
) t
group by col1;
This can prune no-used columns to speed up query performance.
create table t1 (c1 int, c2 int, c3 int);
insert into t1 values(0, 0, 0);
explain logical select c1, sum(c3) from (
select c1, c2, sum(c3) as c3 from t1 group by c1, c2
union all
select c1, c2, sum(c3) as c3 from t1 group by c1, c2
) t
group by c1;
LogicalProjectOperator {projection=[9: c1, 12: sum]}
-> LogicalAggregation {type=GLOBAL ,aggregations={12: sum=sum(11: sum)} ,groupKeys=[9: c1] ,projection=null ,predicate=null}
-> LogicalProjectOperator {projection=[9: c1, 11: sum]}
-> LogicalUnionOperator{output=[9: c1, 10: c2, 11: sum], 1: c1, 2: c2, 4: sum, 5: c1, 6: c2, 8: sum}
-> LogicalProjectOperator {projection=[1: c1, 2: c2, 4: sum]}
-> LogicalProjectOperator {projection=[1: c1, 2: c2, 4: sum]}
-> LogicalAggregation {type=GLOBAL ,aggregations={4: sum=sum(3: c3)} ,groupKeys=[1: c1, 2: c2] ,projection=null ,predicate=null}
-> LogicalProjectOperator {projection=[1: c1, 2: c2, 3: c3]}
-> LogicalProjectOperator {projection=[1: c1, 2: c2, 3: c3]}
-> LogicalOlapScanOperator {table=85046, selectedPartitionId=null, selectedIndexId=85047, outputColumns=[1: c1, 2: c2, 3: c3], predicate=null, prunedPartitionPredicates=[], limit=-1}
-> LogicalProjectOperator {projection=[5: c1, 6: c2, 8: sum]}
-> LogicalProjectOperator {projection=[5: c1, 6: c2, 8: sum]}
-> LogicalAggregation {type=GLOBAL ,aggregations={8: sum=sum(7: c3)} ,groupKeys=[5: c1, 6: c2] ,projection=null ,predicate=null}
-> LogicalProjectOperator {projection=[5: c1, 6: c2, 7: c3]}
-> LogicalProjectOperator {projection=[5: c1, 6: c2, 7: c3]}
-> LogicalOlapScanOperator {table=85046, selectedPartitionId=null, selectedIndexId=85047, outputColumns=[5: c1, 6: c2, 7: c3], predicate=null, prunedPartitionPredicates=[], limit=-1}
There may be projection between Agg, Union and Agg, how should I skip it? or I can only overwrite check() to judge?
Projection can be merged into operator, you can check and use MergeProjectionRule for more details.