doris icon indicating copy to clipboard operation
doris copied to clipboard

[Enhancement] Grouping Sets predicate push down

Open cambyzju opened this issue 5 months ago • 0 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Description

First of all, create a test table and insert some data;

CREATE TABLE `a` (
  `k1` int NOT NULL,
  `k2` int NOT NULL,
  `v` int SUM NULL
) ENGINE=OLAP
AGGREGATE KEY(`k1`, `k2`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`k1`) BUCKETS 10;

insert into a values(1,2,3),(4,5,6);

Problem: Then try to explain SQL with doris-2.1.5-rc02, we found the condition do not push down:

> explain select * from (select k1,coalesce(k2,999) as kk2,sum(v) from a group by grouping sets((k1,k2),(k1))) x where x.kk2=999;
+----------------------------------------------------------------------+
| Explain String(Nereids Planner)                                      |
+----------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                      |
|   OUTPUT EXPRS:                                                      |
|     k1[#16]                                                          |
|     kk2[#17]                                                         |
|     sum(v)[#18]                                                      |
|   PARTITION: UNPARTITIONED                                           |
|                                                                      |
|   HAS_COLO_PLAN_NODE: false                                          |
|                                                                      |
|   VRESULT SINK                                                       |
|      MYSQL_PROTOCAL                                                  |
|                                                                      |
|   5:VEXCHANGE                                                        |
|      offset: 0                                                       |
|      distribute expr lists: k1[#16]                                  |
|                                                                      |
| PLAN FRAGMENT 1                                                      |
|                                                                      |
|   PARTITION: HASH_PARTITIONED: k1[#8], k2[#9], GROUPING_ID[#10]      |
|                                                                      |
|   HAS_COLO_PLAN_NODE: true                                           |
|                                                                      |
|   STREAM DATA SINK                                                   |
|     EXCHANGE ID: 05                                                  |
|     UNPARTITIONED                                                    |
|                                                                      |
|   4:VAGGREGATE (merge finalize)(279)                                 |
|   |  output: sum(partial_sum(v)[#11])[#15]                           |
|   |  group by: k1[#8], k2[#9], GROUPING_ID[#10]                      |
|   |  having: (coalesce(k2[#13], 999) = 999)                          |
|   |  cardinality=2                                                   |
|   |  final projections: k1[#12], coalesce(k2[#13], 999), sum(v)[#15] |
|   |  final project output tuple id: 5                                |
|   |  distribute expr lists: k1[#8], k2[#9], GROUPING_ID[#10]         |
|   |                                                                  |
|   3:VEXCHANGE                                                        |
|      offset: 0                                                       |
|      distribute expr lists:                                          |
|                                                                      |
| PLAN FRAGMENT 2                                                      |
|                                                                      |
|   PARTITION: HASH_PARTITIONED: k1[#0]                                |
|                                                                      |
|   HAS_COLO_PLAN_NODE: false                                          |
|                                                                      |
|   STREAM DATA SINK                                                   |
|     EXCHANGE ID: 03                                                  |
|     HASH_PARTITIONED: k1[#8], k2[#9], GROUPING_ID[#10]               |
|                                                                      |
|   2:VAGGREGATE (update serialize)(273)                               |
|   |  STREAMING                                                       |
|   |  output: partial_sum(v[#6])[#11]                                 |
|   |  group by: k1[#4], k2[#5], GROUPING_ID[#7]                       |
|   |  cardinality=2                                                   |
|   |  distribute expr lists:                                          |
|   |                                                                  |
|   1:VREPEAT_NODE(270)                                                |
|   |  repeat: repeat 1 lines [[4, 5], [4]]                            |
|   |  exprs: k1[#0], k2[#1], v[#2]                                    |
|   |  output slots: `null`, `null`, `null`, `GROUPING_ID`             |
|   |  distribute expr lists: k1[#0]                                   |
|   |                                                                  |
|   0:VOlapScanNode(267)                                               |
|      TABLE: example_db.a(a), PREAGGREGATION: ON                      |
|      partitions=1/1 (a)                                              |
|      tablets=10/10, tabletList=38035,38037,38039 ...                 |
|      cardinality=2, avgRowSize=0.0, numNodes=1                       |
|      pushAggOp=NONE                                                  |
|                                                                      |
|                                                                      |
| Statistics                                                           |
|  planed with unknown column statistics                               |
+----------------------------------------------------------------------+

Starrocks: But while we try to use Starrocks(Version: 3.2.6-2585333), we found the predicate pushed down to REPEAT_NODE:

> explain select * from (select k1,coalesce(k2,999) as kk2,sum(v) from a group by grouping sets((k1,k2),(k1))) x where x.kk2=999;
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| PLAN FRAGMENT 0                                             |
|  OUTPUT EXPRS:1: k1 | 6: coalesce | 4: sum                  |
|   PARTITION: HASH_PARTITIONED: 1: k1, 2: k2, 5: GROUPING_ID |
|                                                             |
|   RESULT SINK                                               |
|                                                             |
|   5:Project                                                 |
|   |  <slot 1> : 1: k1                                       |
|   |  <slot 4> : 4: sum                                      |
|   |  <slot 6> : coalesce(2: k2, 999)                        |
|   |                                                         |
|   4:AGGREGATE (merge finalize)                              |
|   |  output: sum(4: sum)                                    |
|   |  group by: 1: k1, 2: k2, 5: GROUPING_ID                 |
|   |                                                         |
|   3:EXCHANGE                                                |
|                                                             |
| PLAN FRAGMENT 1                                             |
|  OUTPUT EXPRS:                                              |
|   PARTITION: RANDOM                                         |
|                                                             |
|   STREAM DATA SINK                                          |
|     EXCHANGE ID: 03                                         |
|     HASH_PARTITIONED: 1: k1, 2: k2, 5: GROUPING_ID          |
|                                                             |
|   2:AGGREGATE (update serialize)                            |
|   |  STREAMING                                              |
|   |  output: sum(3: v)                                      |
|   |  group by: 1: k1, 2: k2, 5: GROUPING_ID                 |
|   |                                                         |
|   1:REPEAT_NODE                                             |
|   |  repeat: repeat 1 lines [[1, 2], [1]]                   |
|   |  PREDICATES: coalesce(2: k2, 999) = 999                 |
|   |                                                         |
|   0:OlapScanNode                                            |
|      TABLE: a                                               |
|      PREAGGREGATION: ON                                     |
|      partitions=1/1                                         |
|      rollup: a                                              |
|      tabletRatio=1/1                                        |
|      tabletList=14112                                       |
|      cardinality=2                                          |
|      avgRowSize=12.0                                        |
+-------------------------------------------------------------+

Starrocks: We use another condition, still use Starrocks(Version: 3.2.6-2585333), we found the predicate pushed down to SCAN_NODE:

> explain select * from (select k1,coalesce(k2,999) as kk2,sum(v) from a group by grouping sets((k1,k2),(k1))) x where x.kk2=111;
+-------------------------------------------------------------+
| Explain String                                              |
+-------------------------------------------------------------+
| PLAN FRAGMENT 0                                             |
|  OUTPUT EXPRS:1: k1 | 6: coalesce | 4: sum                  |
|   PARTITION: HASH_PARTITIONED: 1: k1, 2: k2, 5: GROUPING_ID |
|                                                             |
|   RESULT SINK                                               |
|                                                             |
|   5:Project                                                 |
|   |  <slot 1> : 1: k1                                       |
|   |  <slot 4> : 4: sum                                      |
|   |  <slot 6> : coalesce(2: k2, 999)                        |
|   |                                                         |
|   4:AGGREGATE (merge finalize)                              |
|   |  output: sum(4: sum)                                    |
|   |  group by: 1: k1, 2: k2, 5: GROUPING_ID                 |
|   |                                                         |
|   3:EXCHANGE                                                |
|                                                             |
| PLAN FRAGMENT 1                                             |
|  OUTPUT EXPRS:                                              |
|   PARTITION: RANDOM                                         |
|                                                             |
|   STREAM DATA SINK                                          |
|     EXCHANGE ID: 03                                         |
|     HASH_PARTITIONED: 1: k1, 2: k2, 5: GROUPING_ID          |
|                                                             |
|   2:AGGREGATE (update serialize)                            |
|   |  STREAMING                                              |
|   |  output: sum(3: v)                                      |
|   |  group by: 1: k1, 2: k2, 5: GROUPING_ID                 |
|   |                                                         |
|   1:REPEAT_NODE                                             |
|   |  repeat: repeat 1 lines [[1, 2], [1]]                   |
|   |  PREDICATES: coalesce(2: k2, 999) = 111                 |
|   |                                                         |
|   0:OlapScanNode                                            |
|      TABLE: a                                               |
|      PREAGGREGATION: ON                                     |
|      PREDICATES: coalesce(2: k2, 999) = 111                 |
|      partitions=1/1                                         |
|      rollup: a                                              |
|      tabletRatio=1/1                                        |
|      tabletList=14112                                       |
|      cardinality=1                                          |
|      avgRowSize=12.0                                        |
+-------------------------------------------------------------+

Solution

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

cambyzju avatar Aug 28 '24 09:08 cambyzju