[fix](mtmv)Fix high level materialized view not hit because group by eliminate fail
Proposed changes
this depends on https://github.com/apache/doris/pull/36839 https://github.com/apache/doris/pull/36886
Such as low level materialized view contains 5 group by dimension, and query also has 5 group by dimension, they are equals.In this scene, would not add aggregate on mv when try to rewrite query by materialized view. But if query only use 4 group by dimension and the remain demension is can be eliminated, then the query will change to 4 group by dimension. this will cause add aggregate on mv and will cause high level materialize rewrite fail later.
Solution: in aggregate rewrite by materialized view, we try to eliminate mv group by dimension by query used dimension. if eliminate successfully. then high level will rewrite continue.
such as low level mv def sql is as following:
def join_mv_1 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey """ def join_mv_2 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ${mv_1} as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6 """
high level mv def sql is as following:
def join_mv_3 = """ select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ${mv_2} as t1 left join ${mv_2} as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 """
if we run the query as following, it can hit the mv3
select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, t1.agg2, >t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count() as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6 ) as t1 left join ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count() as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6 ) as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
TPC-H: Total hot run time: 39594 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cf6061b6610d62211fc445c50fdd5792b4c6453a, data reload: false
------ Round 1 ----------------------------------
q1 17627 4375 4294 4294
q2 2029 191 187 187
q3 10489 1205 1016 1016
q4 10195 784 737 737
q5 7484 2621 2592 2592
q6 221 140 136 136
q7 952 599 605 599
q8 9242 2102 2058 2058
q9 9110 6520 6476 6476
q10 9008 3730 3751 3730
q11 456 230 234 230
q12 445 230 230 230
q13 18796 2968 3007 2968
q14 271 238 224 224
q15 511 479 479 479
q16 492 375 371 371
q17 975 700 685 685
q18 7981 7645 7274 7274
q19 7675 1492 1537 1492
q20 663 325 347 325
q21 4858 3152 3958 3152
q22 393 339 346 339
Total cold run time: 119873 ms
Total hot run time: 39594 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4416 4268 4234 4234
q2 367 263 297 263
q3 3113 2869 2891 2869
q4 1928 1752 1680 1680
q5 5516 5490 5409 5409
q6 247 150 133 133
q7 2271 1781 1812 1781
q8 3245 3477 3395 3395
q9 8716 8757 8752 8752
q10 4173 3750 3847 3750
q11 579 523 497 497
q12 823 665 639 639
q13 15989 3165 3163 3163
q14 320 281 299 281
q15 529 486 492 486
q16 483 436 437 436
q17 1816 1554 1504 1504
q18 8060 8043 7723 7723
q19 1814 1697 1688 1688
q20 2141 1875 1841 1841
q21 9662 4923 4643 4643
q22 660 528 588 528
Total cold run time: 76868 ms
Total hot run time: 55695 ms
run buildall
TPC-H: Total hot run time: 40079 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 240aaa73f362eec3b6da2321a322030641653f6e, data reload: false
------ Round 1 ----------------------------------
q1 17699 4499 4335 4335
q2 2031 190 196 190
q3 10529 1194 1094 1094
q4 10198 834 791 791
q5 7462 2678 2642 2642
q6 216 135 135 135
q7 947 596 602 596
q8 9221 2089 2084 2084
q9 8906 6495 6500 6495
q10 8948 3701 3755 3701
q11 458 245 237 237
q12 480 224 228 224
q13 17795 2998 2973 2973
q14 261 226 223 223
q15 542 472 484 472
q16 506 387 377 377
q17 972 683 673 673
q18 8121 7605 7526 7526
q19 7544 1434 1516 1434
q20 656 328 331 328
q21 4951 3211 3864 3211
q22 403 338 339 338
Total cold run time: 118846 ms
Total hot run time: 40079 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4442 4295 4318 4295
q2 366 270 269 269
q3 3047 2909 2869 2869
q4 1967 1771 1749 1749
q5 5525 5507 5464 5464
q6 217 129 131 129
q7 2330 1862 1830 1830
q8 3284 3450 3455 3450
q9 8691 8770 8690 8690
q10 4173 3747 3834 3747
q11 607 497 507 497
q12 832 649 640 640
q13 16113 3172 3165 3165
q14 312 287 288 287
q15 532 492 489 489
q16 487 430 428 428
q17 1829 1543 1521 1521
q18 8060 7906 7732 7732
q19 1854 1598 1622 1598
q20 2170 1893 1859 1859
q21 8192 5020 4811 4811
q22 669 560 551 551
Total cold run time: 75699 ms
Total hot run time: 56070 ms
run buildall
run buildall
run buildall
TPC-H: Total hot run time: 39676 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 597d3533913e2586fa32c6158d378e5a50d294c6, data reload: false
------ Round 1 ----------------------------------
q1 17627 4348 4282 4282
q2 2019 196 188 188
q3 10454 1248 1058 1058
q4 10190 789 816 789
q5 7542 2639 2617 2617
q6 221 139 140 139
q7 955 593 607 593
q8 9247 2067 2069 2067
q9 9119 6472 6478 6472
q10 8963 3716 3689 3689
q11 442 241 239 239
q12 420 227 228 227
q13 17960 3006 2987 2987
q14 268 217 213 213
q15 525 482 488 482
q16 529 375 370 370
q17 962 632 686 632
q18 7914 7401 7400 7400
q19 2061 1440 1376 1376
q20 648 308 306 306
q21 4884 3214 3961 3214
q22 398 346 336 336
Total cold run time: 113348 ms
Total hot run time: 39676 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4334 4197 4208 4197
q2 371 263 272 263
q3 2982 2753 2736 2736
q4 2152 1664 1663 1663
q5 5527 5645 5482 5482
q6 225 129 134 129
q7 2224 1829 1838 1829
q8 3278 3420 3402 3402
q9 8662 8675 8716 8675
q10 4129 3840 3894 3840
q11 577 504 501 501
q12 768 607 620 607
q13 15753 3251 3133 3133
q14 304 277 274 274
q15 541 486 492 486
q16 488 450 442 442
q17 1826 1530 1542 1530
q18 8112 7775 7818 7775
q19 1828 1603 1639 1603
q20 2097 1877 1906 1877
q21 5219 5012 5015 5012
q22 653 539 566 539
Total cold run time: 72050 ms
Total hot run time: 55995 ms
TPC-DS: Total hot run time: 172891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 597d3533913e2586fa32c6158d378e5a50d294c6, data reload: false
query1 913 383 368 368
query2 6436 2460 2320 2320
query3 6634 200 216 200
query4 18766 17521 17264 17264
query5 3616 522 489 489
query6 265 162 182 162
query7 4587 296 296 296
query8 308 291 306 291
query9 8709 2412 2376 2376
query10 559 292 282 282
query11 10605 10230 10063 10063
query12 118 89 82 82
query13 1659 359 358 358
query14 9568 6977 6986 6977
query15 242 195 181 181
query16 7449 311 297 297
query17 1392 556 536 536
query18 1936 272 276 272
query19 191 152 152 152
query20 97 82 81 81
query21 206 139 131 131
query22 4375 4089 3955 3955
query23 33773 33725 33736 33725
query24 11289 2864 2869 2864
query25 594 396 385 385
query26 722 162 163 162
query27 2331 313 330 313
query28 6040 2115 2131 2115
query29 910 659 640 640
query30 251 160 153 153
query31 969 772 748 748
query32 100 56 54 54
query33 730 297 287 287
query34 969 478 490 478
query35 736 639 636 636
query36 1151 952 972 952
query37 142 80 83 80
query38 2988 2775 2924 2775
query39 944 851 829 829
query40 211 141 129 129
query41 58 59 57 57
query42 115 104 104 104
query43 608 557 565 557
query44 1173 737 742 737
query45 199 169 172 169
query46 1076 744 733 733
query47 1843 1782 1787 1782
query48 371 317 307 307
query49 867 430 425 425
query50 775 395 395 395
query51 6931 6770 6746 6746
query52 110 92 95 92
query53 368 296 296 296
query54 907 457 449 449
query55 77 76 73 73
query56 318 302 288 288
query57 1152 1043 1051 1043
query58 247 271 277 271
query59 3545 3550 3012 3012
query60 320 338 300 300
query61 116 118 114 114
query62 617 463 445 445
query63 321 301 298 298
query64 8627 2341 1876 1876
query65 3149 3114 3146 3114
query66 785 330 337 330
query67 15775 14880 15105 14880
query68 6144 544 549 544
query69 707 442 361 361
query70 1230 1110 1082 1082
query71 509 288 284 284
query72 9028 5033 5292 5033
query73 768 324 324 324
query74 5898 5528 5589 5528
query75 4244 2689 2662 2662
query76 3794 1004 914 914
query77 669 300 296 296
query78 9640 8868 8868 8868
query79 6868 520 516 516
query80 2669 527 484 484
query81 558 223 225 223
query82 1444 110 110 110
query83 287 172 173 172
query84 270 89 87 87
query85 1319 286 272 272
query86 454 329 294 294
query87 3288 3077 3099 3077
query88 5039 2379 2391 2379
query89 494 378 387 378
query90 1939 190 191 190
query91 129 99 101 99
query92 61 49 50 49
query93 5640 504 504 504
query94 1228 217 209 209
query95 451 315 327 315
query96 614 271 264 264
query97 3203 3005 2997 2997
query98 221 204 194 194
query99 1217 831 838 831
Total cold run time: 283955 ms
Total hot run time: 172891 ms
ClickBench: Total hot run time: 31.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 597d3533913e2586fa32c6158d378e5a50d294c6, data reload: false
query1 0.05 0.03 0.03
query2 0.07 0.03 0.04
query3 0.23 0.05 0.05
query4 1.67 0.08 0.07
query5 0.48 0.48 0.48
query6 1.19 0.73 0.72
query7 0.02 0.01 0.02
query8 0.05 0.04 0.05
query9 0.56 0.48 0.48
query10 0.53 0.54 0.54
query11 0.15 0.12 0.11
query12 0.15 0.13 0.12
query13 0.59 0.60 0.59
query14 0.78 0.79 0.77
query15 0.86 0.81 0.82
query16 0.39 0.37 0.35
query17 0.98 0.97 0.98
query18 0.21 0.24 0.27
query19 1.78 1.67 1.75
query20 0.01 0.01 0.00
query21 15.39 0.79 0.66
query22 4.83 5.56 2.66
query23 18.28 1.40 1.23
query24 2.12 0.23 0.22
query25 0.16 0.08 0.08
query26 0.29 0.22 0.22
query27 0.46 0.23 0.24
query28 13.25 1.02 1.00
query29 12.61 3.37 3.35
query30 0.25 0.06 0.06
query31 2.87 0.38 0.40
query32 3.29 0.48 0.47
query33 2.90 2.93 2.91
query34 17.08 4.30 4.37
query35 4.43 4.41 4.40
query36 0.66 0.47 0.48
query37 0.19 0.15 0.16
query38 0.15 0.15 0.14
query39 0.04 0.03 0.04
query40 0.15 0.12 0.13
query41 0.09 0.05 0.05
query42 0.05 0.05 0.05
query43 0.05 0.04 0.04
Total cold run time: 110.34 s
Total hot run time: 31.3 s
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.