starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Optimizer] Add a new rule to prune no-used columns to speed up query performance for aggregate-union-aggregate pattern

Open LiShuMing opened this issue 11 months ago • 2 comments

Enhancement

For the query below, we can do som optimizations if:

  • parent's aggs can be roll-up and rollup from union's aggs;
  • parent's gb-keys comes from union's gb-keys.

before

select col1, sum(col3) from --  parent
(

	select col1, col2, sum(col3) as col3 from t1 group by col1, col2 -- child
	union all
	select col1, col2, sum(col3) as col3 from t1 group by col1, col2
) t
group by col1;

after:

select col1, sum(col3) from
(

	select col1, sum(col3) as col3 from t1 group by col1
	union all
	select col1, sum(col3) as col3 from t1 group by col1
) t
group by col1;

This can prune no-used columns to speed up query performance.

LiShuMing avatar Jan 21 '25 11:01 LiShuMing

create table t1 (c1 int, c2 int, c3 int);
insert into t1 values(0, 0, 0);

explain logical select c1, sum(c3) from  (  
   select c1, c2, sum(c3) as c3 from t1 group by c1, c2 
   union all 
   select c1, c2, sum(c3) as c3 from t1 group by c1, c2
 ) t 
group by c1;
LogicalProjectOperator {projection=[9: c1, 12: sum]}
->  LogicalAggregation {type=GLOBAL ,aggregations={12: sum=sum(11: sum)} ,groupKeys=[9: c1] ,projection=null ,predicate=null}
    ->  LogicalProjectOperator {projection=[9: c1, 11: sum]}
        ->  LogicalUnionOperator{output=[9: c1, 10: c2, 11: sum], 1: c1, 2: c2, 4: sum, 5: c1, 6: c2, 8: sum}
            ->  LogicalProjectOperator {projection=[1: c1, 2: c2, 4: sum]}
                ->  LogicalProjectOperator {projection=[1: c1, 2: c2, 4: sum]}
                    ->  LogicalAggregation {type=GLOBAL ,aggregations={4: sum=sum(3: c3)} ,groupKeys=[1: c1, 2: c2] ,projection=null ,predicate=null}
                        ->  LogicalProjectOperator {projection=[1: c1, 2: c2, 3: c3]}
                            ->  LogicalProjectOperator {projection=[1: c1, 2: c2, 3: c3]}
                                ->  LogicalOlapScanOperator {table=85046, selectedPartitionId=null, selectedIndexId=85047, outputColumns=[1: c1, 2: c2, 3: c3], predicate=null, prunedPartitionPredicates=[], limit=-1}
            ->  LogicalProjectOperator {projection=[5: c1, 6: c2, 8: sum]}
                ->  LogicalProjectOperator {projection=[5: c1, 6: c2, 8: sum]}
                    ->  LogicalAggregation {type=GLOBAL ,aggregations={8: sum=sum(7: c3)} ,groupKeys=[5: c1, 6: c2] ,projection=null ,predicate=null}
                        ->  LogicalProjectOperator {projection=[5: c1, 6: c2, 7: c3]}
                            ->  LogicalProjectOperator {projection=[5: c1, 6: c2, 7: c3]}
                                ->  LogicalOlapScanOperator {table=85046, selectedPartitionId=null, selectedIndexId=85047, outputColumns=[5: c1, 6: c2, 7: c3], predicate=null, prunedPartitionPredicates=[], limit=-1}

There may be projection between Agg, Union and Agg, how should I skip it? or I can only overwrite check() to judge?

KKould avatar Feb 11 '25 09:02 KKould

Projection can be merged into operator, you can check and use MergeProjectionRule for more details.

LiShuMing avatar Jun 20 '25 06:06 LiShuMing