doris icon indicating copy to clipboard operation
doris copied to clipboard

[feat](nereids)Compress materialize for group by

Open englefly opened this issue 1 year ago • 5 comments

Proposed changes

Aggregation on int column is more efficient than aggregation on string column. And hence this optimization is to convert 'select A from T group by A' to 'select any_value(A) from T group by encode_as_int(A)' , where A is a string column.

there are some limitations:

  1. only support group by. Join will be supported later
  2. because it is hard to replace expressions in parent node, some patterns are not supported now. example: select * from
    (select substring(A, 1, 3) as X from T group by substring(A, 1, 3)) T1 join T on T1.X=T.A when we rewrite aggregate node, it is hard to update the expression "T1.X=T.A" in join node.

Issue Number: close #xxx

englefly avatar Oct 16 '24 12:10 englefly

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.

doris-robot avatar Oct 16 '24 12:10 doris-robot

run buildall

englefly avatar Oct 17 '24 10:10 englefly

TPC-H: Total hot run time: 41596 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5415f0aa5bd6727293bb3a450aa3314314076850, data reload: false

------ Round 1 ----------------------------------
q1	17625	8647	7810	7810
q2	2017	273	309	273
q3	10312	1145	1146	1145
q4	10232	784	854	784
q5	7770	3064	3030	3030
q6	240	157	153	153
q7	1024	615	600	600
q8	9356	1895	1983	1895
q9	6560	6547	6466	6466
q10	7073	2440	2471	2440
q11	451	247	243	243
q12	419	223	227	223
q13	17806	2999	2991	2991
q14	245	207	223	207
q15	581	542	510	510
q16	687	597	599	597
q17	985	539	515	515
q18	7188	6722	6677	6677
q19	1356	1020	1006	1006
q20	462	180	185	180
q21	3955	2968	2876	2876
q22	1124	1028	975	975
Total cold run time: 107468 ms
Total hot run time: 41596 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7821	7824	7851	7824
q2	321	239	251	239
q3	3060	2949	3060	2949
q4	2032	1805	1777	1777
q5	5702	5801	5781	5781
q6	239	146	142	142
q7	2294	1797	1780	1780
q8	3415	3446	3503	3446
q9	8890	8895	8894	8894
q10	3591	3580	3582	3580
q11	593	491	510	491
q12	886	629	637	629
q13	9224	3227	3182	3182
q14	311	273	293	273
q15	584	525	526	525
q16	720	669	657	657
q17	1832	1630	1622	1622
q18	8364	7762	7653	7653
q19	1732	1511	1467	1467
q20	2125	1888	1836	1836
q21	5749	5494	5515	5494
q22	1172	1073	1022	1022
Total cold run time: 70657 ms
Total hot run time: 61263 ms

doris-robot avatar Oct 17 '24 11:10 doris-robot

run buildall

englefly avatar Oct 19 '24 02:10 englefly

TPC-H: Total hot run time: 42218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b0c0cb4e3f0b7c862d9bac62eea10bc0ead94a7, data reload: false

------ Round 1 ----------------------------------
q1	17934	8414	7850	7850
q2	2057	285	173	173
q3	10590	1164	1173	1164
q4	10440	819	821	819
q5	7731	3230	3082	3082
q6	239	149	153	149
q7	1024	624	609	609
q8	9374	1995	1993	1993
q9	6722	6479	6466	6466
q10	7077	2443	2463	2443
q11	454	248	256	248
q12	424	232	233	232
q13	17790	3021	3016	3016
q14	254	217	211	211
q15	575	523	523	523
q16	684	602	606	602
q17	995	604	499	499
q18	7375	6853	6842	6842
q19	1358	1036	923	923
q20	509	185	187	185
q21	4072	3176	3218	3176
q22	1122	1022	1013	1013
Total cold run time: 108800 ms
Total hot run time: 42218 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7799	7769	7845	7769
q2	351	248	240	240
q3	3135	3021	3058	3021
q4	2044	1852	1807	1807
q5	5755	5764	5814	5764
q6	237	148	152	148
q7	2296	1822	1824	1822
q8	3460	3461	3532	3461
q9	8949	8970	9020	8970
q10	3599	3554	3572	3554
q11	587	490	499	490
q12	841	617	606	606
q13	9008	3194	3236	3194
q14	310	276	279	276
q15	587	527	526	526
q16	732	661	645	645
q17	1891	1659	1630	1630
q18	8285	7859	7681	7681
q19	1733	1630	1548	1548
q20	2131	1869	1863	1863
q21	5727	5538	5491	5491
q22	1161	1072	1041	1041
Total cold run time: 70618 ms
Total hot run time: 61547 ms

doris-robot avatar Oct 19 '24 03:10 doris-robot