doris icon indicating copy to clipboard operation
doris copied to clipboard

[fix](filecache) fix warm up cancel failure when BE is down

Open freemandealer opened this issue 1 month ago • 14 comments

Fixed issue where cancel flow would exit if a BE was offline,preventing subsequent BEs from receiving clear_job RPC.Now skips failed BEs and continues sending RPCs to others.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • [ ] Regression test
    • [ ] Unit Test
    • [ ] Manual test (add detailed scripts or steps below)
    • [ ] No need to test or manual test. Explain why:
      • [ ] This is a refactor/code format and no logic has been changed.
      • [ ] Previous test can cover this change.
      • [ ] No code files have been changed.
      • [ ] Other reason
  • Behavior changed:

    • [ ] No.
    • [ ] Yes.
  • Does this need documentation?

    • [ ] No.
    • [ ] Yes.

Check List (For Reviewer who merge this PR)

  • [ ] Confirm the release note
  • [ ] Confirm test cases
  • [ ] Confirm document
  • [ ] Add branch pick label

freemandealer avatar Nov 14 '25 09:11 freemandealer

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Thearas avatar Nov 14 '25 09:11 Thearas

run buildall

freemandealer avatar Nov 14 '25 09:11 freemandealer

TPC-H: Total hot run time: 34254 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7b08d13de2e571d4eb726a29d0d0910b99f5c855, data reload: false

------ Round 1 ----------------------------------
q1	17613	5211	5061	5061
q2	2036	346	193	193
q3	10210	1353	718	718
q4	10230	893	362	362
q5	7488	2354	2386	2354
q6	182	172	137	137
q7	922	776	613	613
q8	9344	1263	1142	1142
q9	6954	5136	5089	5089
q10	6825	2232	1806	1806
q11	494	314	302	302
q12	336	362	221	221
q13	17776	3631	3038	3038
q14	234	234	218	218
q15	573	527	495	495
q16	1000	1001	935	935
q17	577	867	360	360
q18	7607	7190	7150	7150
q19	1216	948	561	561
q20	344	353	227	227
q21	3807	3202	2289	2289
q22	1054	1037	983	983
Total cold run time: 106822 ms
Total hot run time: 34254 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5145	5319	5090	5090
q2	246	346	231	231
q3	2167	2672	2354	2354
q4	1375	1785	1319	1319
q5	4213	4423	4602	4423
q6	208	171	132	132
q7	2061	1981	1842	1842
q8	2599	2689	2625	2625
q9	7234	7384	7276	7276
q10	3030	3360	2888	2888
q11	607	542	507	507
q12	706	766	638	638
q13	3618	3968	3328	3328
q14	297	304	287	287
q15	567	537	521	521
q16	1096	1170	1095	1095
q17	1234	1573	1370	1370
q18	7935	7670	7528	7528
q19	790	803	856	803
q20	1997	2057	1923	1923
q21	5039	4336	4276	4276
q22	1105	1038	992	992
Total cold run time: 53269 ms
Total hot run time: 51448 ms

doris-robot avatar Nov 14 '25 10:11 doris-robot

TPC-DS: Total hot run time: 187417 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7b08d13de2e571d4eb726a29d0d0910b99f5c855, data reload: false

query1	1038	409	390	390
query2	6584	1718	1666	1666
query3	6754	222	228	222
query4	26269	23697	22953	22953
query5	4344	622	477	477
query6	324	245	232	232
query7	4643	507	296	296
query8	290	246	237	237
query9	8692	2608	2596	2596
query10	485	335	303	303
query11	15755	15077	14901	14901
query12	173	119	112	112
query13	1665	564	428	428
query14	10640	9165	9075	9075
query15	199	192	173	173
query16	7296	661	531	531
query17	1227	756	625	625
query18	2015	420	324	324
query19	208	200	171	171
query20	130	127	122	122
query21	216	130	113	113
query22	4068	4111	4134	4111
query23	34276	33309	32988	32988
query24	8475	2405	2403	2403
query25	609	525	432	432
query26	1230	308	157	157
query27	2741	495	353	353
query28	4390	2221	2196	2196
query29	850	608	504	504
query30	299	227	196	196
query31	918	805	736	736
query32	82	76	69	69
query33	611	381	342	342
query34	793	857	520	520
query35	799	837	750	750
query36	958	986	886	886
query37	122	115	90	90
query38	3502	3520	3445	3445
query39	1487	1433	1400	1400
query40	225	133	117	117
query41	63	62	62	62
query42	129	114	116	114
query43	498	493	462	462
query44	1221	744	741	741
query45	181	180	172	172
query46	879	1015	640	640
query47	1794	1820	1753	1753
query48	385	429	324	324
query49	795	502	438	438
query50	659	689	415	415
query51	3930	3941	3950	3941
query52	109	116	103	103
query53	244	267	207	207
query54	324	310	295	295
query55	92	89	88	88
query56	345	358	330	330
query57	1190	1183	1144	1144
query58	302	279	287	279
query59	2543	2693	2552	2552
query60	361	363	328	328
query61	188	192	182	182
query62	796	729	695	695
query63	231	205	199	199
query64	4669	1290	957	957
query65	4039	3950	3948	3948
query66	1187	464	359	359
query67	15406	15250	14847	14847
query68	8279	927	596	596
query69	504	336	302	302
query70	1320	1309	1228	1228
query71	436	345	326	326
query72	6212	4846	4931	4846
query73	666	580	361	361
query74	9023	9062	8733	8733
query75	3454	3327	2822	2822
query76	3394	1165	720	720
query77	687	400	320	320
query78	9479	9729	8896	8896
query79	2011	814	587	587
query80	633	581	509	509
query81	494	268	232	232
query82	421	159	130	130
query83	264	264	252	252
query84	255	111	95	95
query85	908	491	439	439
query86	346	328	287	287
query87	3659	3740	3568	3568
query88	3572	2288	2264	2264
query89	394	335	295	295
query90	1957	226	221	221
query91	182	167	140	140
query92	88	67	67	67
query93	1706	991	636	636
query94	779	463	338	338
query95	411	326	307	307
query96	480	576	285	285
query97	2932	3005	2882	2882
query98	246	215	211	211
query99	1378	1410	1305	1305
Total cold run time: 274512 ms
Total hot run time: 187417 ms

doris-robot avatar Nov 14 '25 10:11 doris-robot

ClickBench: Total hot run time: 27.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7b08d13de2e571d4eb726a29d0d0910b99f5c855, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.04	0.04
query3	0.25	0.08	0.08
query4	1.60	0.11	0.10
query5	0.27	0.25	0.24
query6	1.17	0.64	0.63
query7	0.04	0.02	0.03
query8	0.06	0.04	0.05
query9	0.57	0.52	0.52
query10	0.57	0.57	0.58
query11	0.15	0.12	0.11
query12	0.16	0.12	0.12
query13	0.63	0.60	0.61
query14	1.02	1.01	1.00
query15	0.85	0.82	0.83
query16	0.38	0.40	0.41
query17	1.04	1.10	1.02
query18	0.21	0.20	0.19
query19	1.92	1.82	1.83
query20	0.02	0.02	0.01
query21	15.45	0.19	0.13
query22	5.07	0.06	0.04
query23	15.67	0.25	0.11
query24	3.41	0.58	0.95
query25	0.07	0.06	0.05
query26	0.13	0.14	0.13
query27	0.08	0.05	0.05
query28	4.61	1.14	0.94
query29	12.58	3.93	3.19
query30	0.29	0.14	0.12
query31	2.84	0.59	0.39
query32	3.23	0.55	0.47
query33	3.01	3.11	3.09
query34	15.87	5.13	4.57
query35	4.58	4.59	4.58
query36	0.68	0.51	0.49
query37	0.10	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.14	0.14
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 99.17 s
Total hot run time: 27.66 s

doris-robot avatar Nov 14 '25 10:11 doris-robot

FE Regression Coverage Report

Increment line coverage 0.00% (0/19) :tada: Increment coverage report Complete coverage report

hello-stephen avatar Nov 14 '25 12:11 hello-stephen

run buildall

freemandealer avatar Dec 01 '25 09:12 freemandealer

run buildall

freemandealer avatar Dec 02 '25 08:12 freemandealer

TPC-H: Total hot run time: 34679 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cab851da3ce63a3b1ae5cfb9c8c965906d7c5ef8, data reload: false

------ Round 1 ----------------------------------
q1	17634	5125	4922	4922
q2	2060	321	219	219
q3	10214	1342	757	757
q4	10229	880	333	333
q5	7533	2521	2239	2239
q6	189	168	135	135
q7	994	796	657	657
q8	9390	1510	1126	1126
q9	7536	5311	5445	5311
q10	6874	2219	1785	1785
q11	534	316	282	282
q12	378	383	242	242
q13	17830	3682	3043	3043
q14	237	228	219	219
q15	589	524	503	503
q16	895	877	812	812
q17	685	785	594	594
q18	7897	7124	7165	7124
q19	1085	991	613	613
q20	378	356	238	238
q21	4141	3825	2581	2581
q22	1067	993	944	944
Total cold run time: 108369 ms
Total hot run time: 34679 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4983	4991	4930	4930
q2	332	394	339	339
q3	2185	2710	2291	2291
q4	1312	1755	1307	1307
q5	4277	4155	4159	4155
q6	214	169	126	126
q7	1886	1825	1689	1689
q8	2552	2485	2394	2394
q9	7012	7016	7006	7006
q10	2906	3108	2683	2683
q11	589	501	489	489
q12	666	708	553	553
q13	3330	3652	3044	3044
q14	288	299	297	297
q15	546	502	512	502
q16	877	894	858	858
q17	1136	1348	1335	1335
q18	7376	7035	6952	6952
q19	841	860	842	842
q20	1951	1957	1848	1848
q21	4712	4308	4126	4126
q22	1072	1021	951	951
Total cold run time: 51043 ms
Total hot run time: 48717 ms

doris-robot avatar Dec 02 '25 09:12 doris-robot

TPC-DS: Total hot run time: 181379 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cab851da3ce63a3b1ae5cfb9c8c965906d7c5ef8, data reload: false

query1	1027	404	427	404
query2	5455	1192	1165	1165
query3	4402	224	225	224
query4	25129	23384	22761	22761
query5	5126	665	497	497
query6	430	244	231	231
query7	5432	533	301	301
query8	332	247	242	242
query9	7120	2626	2664	2626
query10	559	365	310	310
query11	15744	15143	14548	14548
query12	182	121	115	115
query13	1564	589	435	435
query14	8019	5958	5927	5927
query15	213	199	186	186
query16	5911	722	518	518
query17	995	785	637	637
query18	2051	441	345	345
query19	216	212	199	199
query20	129	134	125	125
query21	216	141	118	118
query22	3876	3966	3882	3882
query23	33111	32134	32149	32134
query24	7285	2425	2441	2425
query25	622	557	492	492
query26	744	285	173	173
query27	2634	505	353	353
query28	4190	2163	2175	2163
query29	799	653	527	527
query30	317	248	212	212
query31	854	720	628	628
query32	88	77	73	73
query33	608	403	354	354
query34	852	875	542	542
query35	802	870	751	751
query36	886	922	834	834
query37	125	115	89	89
query38	3970	3944	3768	3768
query39	1502	1395	1386	1386
query40	233	133	119	119
query41	65	62	60	60
query42	127	114	108	108
query43	425	437	415	415
query44	1317	753	747	747
query45	199	194	184	184
query46	903	1020	644	644
query47	1683	1746	1666	1666
query48	405	457	338	338
query49	766	502	414	414
query50	681	709	405	405
query51	3944	3893	3851	3851
query52	112	110	102	102
query53	237	257	188	188
query54	317	319	277	277
query55	99	99	96	96
query56	334	337	325	325
query57	1120	1146	1092	1092
query58	287	280	271	271
query59	2304	2412	2282	2282
query60	355	365	330	330
query61	160	160	157	157
query62	777	729	656	656
query63	230	191	201	191
query64	3494	1222	885	885
query65	4048	3987	3979	3979
query66	1135	445	331	331
query67	15221	14927	14677	14677
query68	9639	977	620	620
query69	609	353	313	313
query70	1110	1039	1023	1023
query71	470	350	316	316
query72	6233	4955	5043	4955
query73	746	633	347	347
query74	8841	8808	8675	8675
query75	3212	3036	2500	2500
query76	3770	1139	766	766
query77	536	408	311	311
query78	9560	9654	8759	8759
query79	2679	883	582	582
query80	919	604	517	517
query81	556	265	248	248
query82	410	142	113	113
query83	277	270	254	254
query84	312	118	126	118
query85	905	511	444	444
query86	367	304	289	289
query87	4113	4137	3913	3913
query88	3852	2328	2318	2318
query89	396	339	309	309
query90	1961	243	232	232
query91	177	170	141	141
query92	91	72	70	70
query93	2979	1048	656	656
query94	785	477	346	346
query95	509	420	414	414
query96	524	576	291	291
query97	2611	2652	2587	2587
query98	255	224	214	214
query99	1311	1365	1248	1248
Total cold run time: 265210 ms
Total hot run time: 181379 ms

doris-robot avatar Dec 02 '25 09:12 doris-robot

ClickBench: Total hot run time: 27.36 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cab851da3ce63a3b1ae5cfb9c8c965906d7c5ef8, data reload: false

query1	0.05	0.05	0.05
query2	0.11	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.11	0.11
query5	0.26	0.24	0.26
query6	1.17	0.64	0.63
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.56	0.52	0.49
query10	0.55	0.55	0.55
query11	0.17	0.11	0.11
query12	0.16	0.11	0.12
query13	0.62	0.62	0.60
query14	0.99	1.00	1.00
query15	0.83	0.80	0.80
query16	0.38	0.38	0.39
query17	0.98	1.02	1.02
query18	0.24	0.21	0.22
query19	1.96	1.82	1.88
query20	0.02	0.01	0.01
query21	15.44	0.28	0.15
query22	4.83	0.06	0.05
query23	16.13	0.29	0.10
query24	1.40	0.65	0.35
query25	0.10	0.06	0.06
query26	0.15	0.13	0.13
query27	0.10	0.04	0.04
query28	4.45	1.25	1.02
query29	12.57	3.98	3.15
query30	0.27	0.14	0.13
query31	2.81	0.63	0.40
query32	3.23	0.58	0.47
query33	3.04	3.05	3.07
query34	16.98	5.13	4.53
query35	4.50	4.55	4.57
query36	0.69	0.50	0.50
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.17	0.14	0.13
query41	0.08	0.04	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 98.26 s
Total hot run time: 27.36 s

doris-robot avatar Dec 02 '25 09:12 doris-robot

FE Regression Coverage Report

Increment line coverage 0.00% (0/44) :tada: Increment coverage report Complete coverage report

hello-stephen avatar Dec 02 '25 11:12 hello-stephen

PR approved by at least one committer and no changes requested.

github-actions[bot] avatar Dec 11 '25 03:12 github-actions[bot]

PR approved by anyone and no changes requested.

github-actions[bot] avatar Dec 11 '25 03:12 github-actions[bot]