doris icon indicating copy to clipboard operation
doris copied to clipboard

[Opt](compaction) Prune rows with delete sign=1 in full compaction

Open bobhan1 opened this issue 5 months ago • 18 comments

What problem does this PR solve?

  1. Prune rows with delete sign=1 in full compaction
  2. also check duplicate key in base compaction when enable_prune_delete_sign_when_base_compaction=false

Release note

None

Check List (For Author)

  • Test

    • [ ] Regression test
    • [ ] Unit Test
    • [ ] Manual test (add detailed scripts or steps below)
    • [ ] No need to test or manual test. Explain why:
      • [ ] This is a refactor/code format and no logic has been changed.
      • [ ] Previous test can cover this change.
      • [ ] No code files have been changed.
      • [ ] Other reason
  • Behavior changed:

    • [ ] No.
    • [ ] Yes.
  • Does this need documentation?

    • [ ] No.
    • [ ] Yes.

Check List (For Reviewer who merge this PR)

  • [ ] Confirm the release note
  • [ ] Confirm test cases
  • [ ] Confirm document
  • [ ] Add branch pick label

bobhan1 avatar Jun 18 '25 08:06 bobhan1

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

hello-stephen avatar Jun 18 '25 08:06 hello-stephen

run buildall

bobhan1 avatar Jun 18 '25 08:06 bobhan1

TPC-H: Total hot run time: 33752 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a1df4ba0b231e2262780dda3c33a8d134b03ad4d, data reload: false

------ Round 1 ----------------------------------
q1	17578	5165	4985	4985
q2	1936	290	189	189
q3	10314	1362	742	742
q4	10318	1099	558	558
q5	8623	2397	2361	2361
q6	220	161	129	129
q7	899	807	628	628
q8	9328	1310	1034	1034
q9	6999	5052	5145	5052
q10	6924	2397	1950	1950
q11	499	281	260	260
q12	352	345	217	217
q13	17812	3665	3087	3087
q14	233	235	205	205
q15	566	490	477	477
q16	435	432	374	374
q17	584	866	359	359
q18	7617	7257	7045	7045
q19	1280	945	554	554
q20	346	349	228	228
q21	3836	3186	2353	2353
q22	1047	1010	965	965
Total cold run time: 107746 ms
Total hot run time: 33752 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5001	5052	5022	5022
q2	240	320	230	230
q3	2162	2661	2292	2292
q4	1342	1836	1378	1378
q5	4243	4127	4443	4127
q6	219	173	131	131
q7	2075	1954	1762	1762
q8	2637	2651	2519	2519
q9	7073	7128	7114	7114
q10	3131	3277	2839	2839
q11	565	501	487	487
q12	695	785	605	605
q13	3482	3873	3294	3294
q14	273	311	266	266
q15	522	485	472	472
q16	439	496	454	454
q17	1154	1511	1407	1407
q18	7712	7512	7354	7354
q19	794	785	888	785
q20	1971	2044	1941	1941
q21	5001	4553	4350	4350
q22	1037	1005	999	999
Total cold run time: 51768 ms
Total hot run time: 49828 ms

doris-robot avatar Jun 18 '25 09:06 doris-robot

TPC-DS: Total hot run time: 186119 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a1df4ba0b231e2262780dda3c33a8d134b03ad4d, data reload: false

query1	986	390	391	390
query2	6546	1855	1812	1812
query3	6759	231	215	215
query4	26346	24027	23049	23049
query5	4396	638	478	478
query6	308	214	198	198
query7	4623	529	292	292
query8	269	231	232	231
query9	8598	2694	2703	2694
query10	464	343	271	271
query11	15941	15021	14780	14780
query12	161	117	112	112
query13	1665	535	421	421
query14	9523	6244	6187	6187
query15	207	213	177	177
query16	7291	621	492	492
query17	1202	734	581	581
query18	1994	426	309	309
query19	200	187	164	164
query20	127	120	121	120
query21	216	130	114	114
query22	4118	4293	4163	4163
query23	34022	32982	33064	32982
query24	8398	2381	2426	2381
query25	524	454	381	381
query26	1243	268	159	159
query27	2750	506	349	349
query28	4301	2172	2135	2135
query29	739	581	426	426
query30	284	232	193	193
query31	917	863	751	751
query32	72	68	63	63
query33	547	382	317	317
query34	812	871	527	527
query35	786	793	715	715
query36	947	970	856	856
query37	115	104	81	81
query38	4065	4163	4148	4148
query39	1491	1418	1385	1385
query40	217	120	108	108
query41	62	60	60	60
query42	124	107	107	107
query43	504	520	466	466
query44	1361	825	825	825
query45	180	176	163	163
query46	851	1016	631	631
query47	1751	1792	1757	1757
query48	395	427	309	309
query49	766	477	432	432
query50	639	681	412	412
query51	4086	4172	4186	4172
query52	110	106	108	106
query53	225	251	184	184
query54	576	578	501	501
query55	88	84	84	84
query56	332	295	292	292
query57	1170	1206	1132	1132
query58	262	265	259	259
query59	2664	2799	2580	2580
query60	339	318	302	302
query61	130	126	123	123
query62	801	715	640	640
query63	223	194	187	187
query64	4301	999	664	664
query65	4314	4161	4193	4161
query66	1143	401	307	307
query67	15750	15830	15450	15450
query68	8484	904	568	568
query69	458	310	273	273
query70	1234	1122	1046	1046
query71	482	338	305	305
query72	5345	4665	4613	4613
query73	694	568	356	356
query74	9195	9176	9102	9102
query75	3870	3191	2677	2677
query76	3614	1206	751	751
query77	767	386	306	306
query78	10111	10243	9376	9376
query79	1943	849	594	594
query80	588	514	469	469
query81	484	259	232	232
query82	449	125	97	97
query83	270	259	234	234
query84	247	109	92	92
query85	807	356	387	356
query86	337	313	313	313
query87	4419	4465	4290	4290
query88	3508	2273	2302	2273
query89	391	304	277	277
query90	1915	220	275	220
query91	141	138	110	110
query92	72	63	59	59
query93	1326	950	592	592
query94	688	410	316	316
query95	385	294	286	286
query96	501	572	278	278
query97	2680	2733	2653	2653
query98	237	211	212	211
query99	1656	1396	1281	1281
Total cold run time: 274615 ms
Total hot run time: 186119 ms

doris-robot avatar Jun 18 '25 09:06 doris-robot

ClickBench: Total hot run time: 29.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a1df4ba0b231e2262780dda3c33a8d134b03ad4d, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.60	0.11	0.10
query5	0.42	0.42	0.41
query6	1.19	0.67	0.68
query7	0.02	0.01	0.01
query8	0.04	0.04	0.03
query9	0.58	0.50	0.51
query10	0.58	0.57	0.56
query11	0.15	0.10	0.11
query12	0.15	0.12	0.12
query13	0.63	0.61	0.61
query14	0.80	0.82	0.82
query15	0.87	0.86	0.88
query16	0.39	0.39	0.41
query17	1.10	1.04	1.04
query18	0.23	0.21	0.21
query19	2.02	1.91	1.86
query20	0.02	0.01	0.01
query21	15.40	0.88	0.57
query22	0.75	1.27	0.93
query23	14.69	1.37	0.67
query24	7.13	0.81	1.17
query25	0.46	0.22	0.13
query26	0.52	0.16	0.13
query27	0.07	0.06	0.06
query28	9.77	0.92	0.43
query29	12.56	3.98	3.27
query30	0.26	0.08	0.06
query31	2.82	0.60	0.39
query32	3.23	0.54	0.47
query33	3.09	3.10	3.09
query34	16.13	5.46	4.82
query35	4.83	4.85	4.83
query36	0.68	0.52	0.50
query37	0.08	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.14	0.13
query41	0.07	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.98 s
Total hot run time: 29.9 s

doris-robot avatar Jun 18 '25 09:06 doris-robot

BE UT Coverage Report

Increment line coverage 33.33% (2/6) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 56.34% (15050/26713)
Line Coverage 45.11% (134599/298393)
Region Coverage 44.24% (67693/153013)
Branch Coverage 38.82% (34730/89464)

hello-stephen avatar Jun 18 '25 10:06 hello-stephen

BE Regression && UT Coverage Report

Increment line coverage 33.33% (2/6) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 60.97% (16029/26290)
Line Coverage 50.46% (150495/298234)
Region Coverage 47.77% (85975/179982)
Branch Coverage 41.33% (42268/102268)

hello-stephen avatar Jun 18 '25 11:06 hello-stephen

run buildall

bobhan1 avatar Jun 18 '25 11:06 bobhan1

TPC-H: Total hot run time: 33995 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 050dd909d610ccd396c82791cb10001d170f62a9, data reload: false

------ Round 1 ----------------------------------
q1	17663	5139	4988	4988
q2	1943	286	195	195
q3	10279	1220	755	755
q4	10207	1007	524	524
q5	7537	2277	2347	2277
q6	176	159	130	130
q7	886	737	641	641
q8	9312	1243	1090	1090
q9	6768	5051	5068	5051
q10	6912	2367	1950	1950
q11	486	291	275	275
q12	341	341	217	217
q13	17765	3693	3149	3149
q14	224	229	209	209
q15	554	468	479	468
q16	430	428	386	386
q17	606	858	378	378
q18	7406	7151	7235	7151
q19	1379	963	561	561
q20	335	328	226	226
q21	3784	3213	2403	2403
q22	1037	1032	971	971
Total cold run time: 106030 ms
Total hot run time: 33995 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5097	4998	5011	4998
q2	247	321	226	226
q3	2124	2617	2282	2282
q4	1325	1771	1366	1366
q5	4193	4123	4290	4123
q6	207	169	137	137
q7	1950	1907	1729	1729
q8	2727	2611	2553	2553
q9	7134	7134	7129	7129
q10	3066	3278	2793	2793
q11	559	508	506	506
q12	644	804	597	597
q13	3655	3926	3329	3329
q14	274	296	276	276
q15	557	498	515	498
q16	436	486	448	448
q17	1176	1601	1373	1373
q18	7829	7509	7294	7294
q19	841	876	1069	876
q20	2043	2078	1907	1907
q21	4957	4515	4283	4283
q22	1098	1035	1016	1016
Total cold run time: 52139 ms
Total hot run time: 49739 ms

doris-robot avatar Jun 18 '25 12:06 doris-robot

TPC-DS: Total hot run time: 192734 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 050dd909d610ccd396c82791cb10001d170f62a9, data reload: false

query1	1426	1006	1007	1006
query2	6287	1802	1788	1788
query3	10995	4613	4418	4418
query4	55155	24658	23340	23340
query5	5102	625	460	460
query6	360	211	217	211
query7	4872	513	297	297
query8	288	242	218	218
query9	5547	2650	2638	2638
query10	450	332	273	273
query11	15003	14980	14799	14799
query12	167	115	110	110
query13	1037	541	441	441
query14	10160	6424	6422	6422
query15	195	205	181	181
query16	6769	635	512	512
query17	1062	718	567	567
query18	1454	400	310	310
query19	198	198	167	167
query20	123	113	120	113
query21	205	130	106	106
query22	4315	4343	4241	4241
query23	34039	33750	33543	33543
query24	6696	2444	2450	2444
query25	462	463	395	395
query26	700	277	152	152
query27	2298	523	351	351
query28	3000	2166	2174	2166
query29	569	578	442	442
query30	288	226	193	193
query31	912	865	786	786
query32	74	63	62	62
query33	444	369	317	317
query34	777	876	539	539
query35	797	845	760	760
query36	946	990	910	910
query37	112	100	76	76
query38	4234	4398	4280	4280
query39	1747	1459	1447	1447
query40	210	134	117	117
query41	69	72	70	70
query42	130	121	115	115
query43	511	515	515	515
query44	1404	863	864	863
query45	193	180	205	180
query46	911	1034	664	664
query47	1858	1880	1789	1789
query48	392	425	340	340
query49	650	501	396	396
query50	673	701	413	413
query51	4224	4227	4162	4162
query52	122	115	103	103
query53	234	266	191	191
query54	587	580	522	522
query55	81	82	85	82
query56	314	307	309	307
query57	1181	1260	1159	1159
query58	259	258	256	256
query59	2712	2804	2723	2723
query60	339	321	312	312
query61	123	136	140	136
query62	732	745	666	666
query63	238	191	201	191
query64	1460	1024	666	666
query65	4262	4211	4152	4152
query66	733	399	307	307
query67	15662	15597	15526	15526
query68	7042	884	523	523
query69	544	313	276	276
query70	1134	1122	1045	1045
query71	482	314	309	309
query72	5745	4780	4880	4780
query73	1305	660	356	356
query74	9402	9172	8867	8867
query75	3831	3202	2685	2685
query76	4147	1187	781	781
query77	716	391	302	302
query78	9924	10019	9318	9318
query79	2322	816	616	616
query80	614	517	441	441
query81	489	255	223	223
query82	440	129	97	97
query83	314	253	233	233
query84	295	104	84	84
query85	780	358	320	320
query86	369	299	302	299
query87	4404	4430	4326	4326
query88	3557	2371	2281	2281
query89	399	313	279	279
query90	1814	211	206	206
query91	142	206	110	110
query92	78	71	53	53
query93	1934	940	578	578
query94	684	411	302	302
query95	365	291	292	291
query96	499	584	278	278
query97	2749	2793	2617	2617
query98	227	206	202	202
query99	1431	1375	1252	1252
Total cold run time: 297707 ms
Total hot run time: 192734 ms

doris-robot avatar Jun 18 '25 12:06 doris-robot

BE UT Coverage Report

Increment line coverage 33.33% (2/6) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 56.35% (15051/26709)
Line Coverage 45.12% (134629/298406)
Region Coverage 44.25% (67716/153022)
Branch Coverage 38.83% (34741/89470)

doris-robot avatar Jun 18 '25 13:06 doris-robot

BE Regression && UT Coverage Report

Increment line coverage 33.33% (2/6) :tada:

Increment coverage report Complete coverage report

Category Coverage
Function Coverage 61.13% (16068/26284)
Line Coverage 50.67% (151108/298245)
Region Coverage 47.99% (86379/180004)
Branch Coverage 41.51% (42457/102280)

hello-stephen avatar Jun 18 '25 14:06 hello-stephen

run performance

bobhan1 avatar Jun 19 '25 02:06 bobhan1

run performance

bobhan1 avatar Jun 19 '25 02:06 bobhan1

run vault_p0

bobhan1 avatar Jun 19 '25 02:06 bobhan1

TPC-H: Total hot run time: 34026 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 050dd909d610ccd396c82791cb10001d170f62a9, data reload: false

------ Round 1 ----------------------------------
q1	17595	5114	4943	4943
q2	1941	296	189	189
q3	10326	1335	703	703
q4	10271	1040	531	531
q5	8534	2395	2401	2395
q6	227	162	131	131
q7	913	728	615	615
q8	9334	1302	1047	1047
q9	6900	5118	5221	5118
q10	6950	2389	1954	1954
q11	480	306	279	279
q12	351	350	215	215
q13	17784	3644	3133	3133
q14	231	233	216	216
q15	558	474	479	474
q16	433	430	371	371
q17	629	867	376	376
q18	7754	7227	7147	7147
q19	1344	952	568	568
q20	347	351	232	232
q21	3897	3265	2382	2382
q22	1085	1020	1007	1007
Total cold run time: 107884 ms
Total hot run time: 34026 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5388	5071	5081	5071
q2	240	328	223	223
q3	2141	2638	2280	2280
q4	1353	1822	1386	1386
q5	4246	4208	4459	4208
q6	215	166	127	127
q7	2020	1946	1780	1780
q8	2662	2609	2512	2512
q9	7196	7192	7212	7192
q10	3068	3320	2854	2854
q11	585	507	494	494
q12	673	753	645	645
q13	3519	3867	3311	3311
q14	276	293	263	263
q15	521	481	476	476
q16	461	482	455	455
q17	1141	1577	1373	1373
q18	7750	7481	7369	7369
q19	823	844	966	844
q20	2031	2044	1861	1861
q21	4961	4435	4266	4266
q22	1096	1047	999	999
Total cold run time: 52366 ms
Total hot run time: 49989 ms

doris-robot avatar Jun 19 '25 02:06 doris-robot

TPC-DS: Total hot run time: 186443 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 050dd909d610ccd396c82791cb10001d170f62a9, data reload: false

query1	1005	403	380	380
query2	6539	1822	1817	1817
query3	6743	243	221	221
query4	27230	23480	23372	23372
query5	4351	633	464	464
query6	334	212	196	196
query7	4621	495	302	302
query8	276	227	212	212
query9	8640	2618	2618	2618
query10	490	339	289	289
query11	15335	15109	14901	14901
query12	156	107	105	105
query13	1670	540	434	434
query14	8691	6198	6115	6115
query15	204	191	174	174
query16	7326	637	505	505
query17	1213	709	586	586
query18	1981	414	317	317
query19	200	191	165	165
query20	128	127	113	113
query21	224	129	111	111
query22	4115	4173	4018	4018
query23	34240	33049	33185	33049
query24	8461	2375	2422	2375
query25	536	464	400	400
query26	1226	275	154	154
query27	2762	504	358	358
query28	4341	2118	2100	2100
query29	775	563	431	431
query30	282	225	188	188
query31	936	882	750	750
query32	76	63	62	62
query33	559	362	315	315
query34	807	850	531	531
query35	773	799	724	724
query36	984	990	885	885
query37	111	99	78	78
query38	4081	4191	4085	4085
query39	1492	1404	1427	1404
query40	209	115	107	107
query41	66	60	56	56
query42	127	112	108	108
query43	507	502	479	479
query44	1332	833	844	833
query45	180	171	170	170
query46	837	1020	630	630
query47	1798	1803	1750	1750
query48	383	439	322	322
query49	748	495	431	431
query50	640	688	401	401
query51	4120	4246	4155	4155
query52	105	107	98	98
query53	221	255	179	179
query54	575	568	496	496
query55	88	81	92	81
query56	318	301	294	294
query57	1229	1231	1135	1135
query58	270	253	265	253
query59	2622	2663	2646	2646
query60	332	347	312	312
query61	133	126	129	126
query62	810	737	657	657
query63	226	189	181	181
query64	4412	1010	698	698
query65	4268	4221	4235	4221
query66	1154	423	326	326
query67	16102	15572	15558	15558
query68	8150	903	541	541
query69	499	312	283	283
query70	1215	1134	1121	1121
query71	503	354	312	312
query72	5792	4708	4772	4708
query73	692	620	361	361
query74	8866	8805	8727	8727
query75	3931	3201	2717	2717
query76	3743	1201	747	747
query77	790	374	299	299
query78	10260	10049	9450	9450
query79	2924	825	586	586
query80	674	498	459	459
query81	500	265	237	237
query82	460	132	96	96
query83	290	246	232	232
query84	299	110	98	98
query85	792	355	319	319
query86	389	307	283	283
query87	4435	4475	4324	4324
query88	3804	2298	2274	2274
query89	403	312	282	282
query90	1838	213	207	207
query91	141	144	111	111
query92	81	62	60	60
query93	2357	946	597	597
query94	673	415	322	322
query95	391	294	284	284
query96	506	569	283	283
query97	2796	2746	2652	2652
query98	242	212	202	202
query99	1443	1398	1262	1262
Total cold run time: 277605 ms
Total hot run time: 186443 ms

doris-robot avatar Jun 19 '25 03:06 doris-robot

ClickBench: Total hot run time: 29.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 050dd909d610ccd396c82791cb10001d170f62a9, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.24	0.07	0.06
query4	1.64	0.10	0.11
query5	0.43	0.43	0.44
query6	1.18	0.66	0.67
query7	0.03	0.01	0.02
query8	0.04	0.03	0.04
query9	0.59	0.51	0.51
query10	0.59	0.58	0.56
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.63	0.61	0.61
query14	0.82	0.81	0.82
query15	0.91	0.86	0.88
query16	0.38	0.38	0.38
query17	1.11	1.07	1.11
query18	0.24	0.22	0.21
query19	1.98	1.82	1.85
query20	0.02	0.02	0.01
query21	15.40	0.86	0.56
query22	0.76	1.10	0.86
query23	14.81	1.40	0.60
query24	7.99	0.69	0.96
query25	0.51	0.19	0.19
query26	0.62	0.16	0.14
query27	0.06	0.06	0.05
query28	9.06	0.92	0.46
query29	12.58	4.03	3.38
query30	0.25	0.09	0.06
query31	2.83	0.63	0.41
query32	3.24	0.55	0.47
query33	3.12	3.13	3.12
query34	16.15	5.42	4.77
query35	4.86	4.88	4.87
query36	0.69	0.54	0.49
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.14
query41	0.08	0.02	0.02
query42	0.04	0.03	0.03
query43	0.04	0.02	0.02
Total cold run time: 104.66 s
Total hot run time: 29.88 s

doris-robot avatar Jun 19 '25 03:06 doris-robot

PR approved by at least one committer and no changes requested.

github-actions[bot] avatar Jun 19 '25 09:06 github-actions[bot]

PR approved by anyone and no changes requested.

github-actions[bot] avatar Jun 19 '25 09:06 github-actions[bot]