doris
doris copied to clipboard
[feat](nereids)Compress materialize for group by
Proposed changes
Aggregation on int column is more efficient than aggregation on string column. And hence this optimization is to convert 'select A from T group by A' to 'select any_value(A) from T group by encode_as_int(A)' , where A is a string column.
there are some limitations:
- only support group by. Join will be supported later
- because it is hard to replace expressions in parent node, some patterns are not supported now.
example:
select *
from
(select substring(A, 1, 3) as X from T group by substring(A, 1, 3)) T1 join T on T1.X=T.A when we rewrite aggregate node, it is hard to update the expression "T1.X=T.A" in join node.
Issue Number: close #xxx
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
TPC-H: Total hot run time: 41596 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5415f0aa5bd6727293bb3a450aa3314314076850, data reload: false
------ Round 1 ----------------------------------
q1 17625 8647 7810 7810
q2 2017 273 309 273
q3 10312 1145 1146 1145
q4 10232 784 854 784
q5 7770 3064 3030 3030
q6 240 157 153 153
q7 1024 615 600 600
q8 9356 1895 1983 1895
q9 6560 6547 6466 6466
q10 7073 2440 2471 2440
q11 451 247 243 243
q12 419 223 227 223
q13 17806 2999 2991 2991
q14 245 207 223 207
q15 581 542 510 510
q16 687 597 599 597
q17 985 539 515 515
q18 7188 6722 6677 6677
q19 1356 1020 1006 1006
q20 462 180 185 180
q21 3955 2968 2876 2876
q22 1124 1028 975 975
Total cold run time: 107468 ms
Total hot run time: 41596 ms
----- Round 2, with runtime_filter_mode=off -----
q1 7821 7824 7851 7824
q2 321 239 251 239
q3 3060 2949 3060 2949
q4 2032 1805 1777 1777
q5 5702 5801 5781 5781
q6 239 146 142 142
q7 2294 1797 1780 1780
q8 3415 3446 3503 3446
q9 8890 8895 8894 8894
q10 3591 3580 3582 3580
q11 593 491 510 491
q12 886 629 637 629
q13 9224 3227 3182 3182
q14 311 273 293 273
q15 584 525 526 525
q16 720 669 657 657
q17 1832 1630 1622 1622
q18 8364 7762 7653 7653
q19 1732 1511 1467 1467
q20 2125 1888 1836 1836
q21 5749 5494 5515 5494
q22 1172 1073 1022 1022
Total cold run time: 70657 ms
Total hot run time: 61263 ms
run buildall
TPC-H: Total hot run time: 42218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b0c0cb4e3f0b7c862d9bac62eea10bc0ead94a7, data reload: false
------ Round 1 ----------------------------------
q1 17934 8414 7850 7850
q2 2057 285 173 173
q3 10590 1164 1173 1164
q4 10440 819 821 819
q5 7731 3230 3082 3082
q6 239 149 153 149
q7 1024 624 609 609
q8 9374 1995 1993 1993
q9 6722 6479 6466 6466
q10 7077 2443 2463 2443
q11 454 248 256 248
q12 424 232 233 232
q13 17790 3021 3016 3016
q14 254 217 211 211
q15 575 523 523 523
q16 684 602 606 602
q17 995 604 499 499
q18 7375 6853 6842 6842
q19 1358 1036 923 923
q20 509 185 187 185
q21 4072 3176 3218 3176
q22 1122 1022 1013 1013
Total cold run time: 108800 ms
Total hot run time: 42218 ms
----- Round 2, with runtime_filter_mode=off -----
q1 7799 7769 7845 7769
q2 351 248 240 240
q3 3135 3021 3058 3021
q4 2044 1852 1807 1807
q5 5755 5764 5814 5764
q6 237 148 152 148
q7 2296 1822 1824 1822
q8 3460 3461 3532 3461
q9 8949 8970 9020 8970
q10 3599 3554 3572 3554
q11 587 490 499 490
q12 841 617 606 606
q13 9008 3194 3236 3194
q14 310 276 279 276
q15 587 527 526 526
q16 732 661 645 645
q17 1891 1659 1630 1630
q18 8285 7859 7681 7681
q19 1733 1630 1548 1548
q20 2131 1869 1863 1863
q21 5727 5538 5491 5491
q22 1161 1072 1041 1041
Total cold run time: 70618 ms
Total hot run time: 61547 ms