doris icon indicating copy to clipboard operation
doris copied to clipboard

[feature](Cloud) Try to do memory limit control for hdfs write

Open ByteYue opened this issue 2 months ago • 9 comments

Proposed changes

Issue Number: close #xxx

In practice, we've found that if the import frequency to HDFS is too fast, it can cause an OutOfMemoryError (OOM) in the JVM started by the JNI. For this, we should have a method to monitor how much JVM memory is currently being used.

The HdfsWriteRateLimit class increments a recorded value during hdfsWrite when writing to HDFS. When hdfsCloseFile is called, all related memory in the JVM will be invalidated, so the recorded value can be decreased at that time. However, if the current usage exceeds the maximum set by the user, the current write will sleep. If the number of sleeps exceeds the number specified by the user, then the current write is considered to have failed.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

ByteYue avatar Apr 30 '24 08:04 ByteYue

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.

doris-robot avatar Apr 30 '24 08:04 doris-robot

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Apr 30 '24 08:04 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Apr 30 '24 09:04 github-actions[bot]

run buildall

ByteYue avatar May 02 '24 07:05 ByteYue

run buildall

ByteYue avatar May 02 '24 09:05 ByteYue

run buildall

ByteYue avatar May 02 '24 13:05 ByteYue

PR approved by anyone and no changes requested.

github-actions[bot] avatar May 08 '24 12:05 github-actions[bot]

run buildall

ByteYue avatar May 08 '24 12:05 ByteYue

run buildall

ByteYue avatar May 08 '24 16:05 ByteYue

run buildall

ByteYue avatar May 09 '24 03:05 ByteYue

TeamCity be ut coverage result: Function Coverage: 35.69% (8989/25185) Line Coverage: 27.34% (74235/271567) Region Coverage: 26.57% (38374/144431) Branch Coverage: 23.38% (19566/83694) Coverage Report: http://coverage.selectdb-in.cc/coverage/18cf00f64305a63be44df67ad22f39e2468b461c_18cf00f64305a63be44df67ad22f39e2468b461c/report/index.html

doris-robot avatar May 09 '24 05:05 doris-robot

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 09 '24 08:05 github-actions[bot]

run buildall

ByteYue avatar May 09 '24 08:05 ByteYue

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 09 '24 08:05 github-actions[bot]

TeamCity be ut coverage result: Function Coverage: 35.69% (8989/25187) Line Coverage: 27.34% (74251/271592) Region Coverage: 26.58% (38389/144445) Branch Coverage: 23.38% (19574/83704) Coverage Report: http://coverage.selectdb-in.cc/coverage/a3eae87aaf4dcc8aae68304e8b9f9b1187cae6a4_a3eae87aaf4dcc8aae68304e8b9f9b1187cae6a4/report/index.html

doris-robot avatar May 09 '24 08:05 doris-robot

TPC-H: Total hot run time: 41713 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a3eae87aaf4dcc8aae68304e8b9f9b1187cae6a4, data reload: false

------ Round 1 ----------------------------------
q1	17600	4293	4255	4255
q2	2031	191	187	187
q3	10468	1182	1247	1182
q4	10186	785	824	785
q5	7476	2711	2751	2711
q6	217	136	138	136
q7	1021	610	586	586
q8	9229	2150	2082	2082
q9	9209	6790	6752	6752
q10	9279	3934	3813	3813
q11	460	238	244	238
q12	475	217	219	217
q13	18244	3149	3203	3149
q14	272	204	214	204
q15	515	465	468	465
q16	482	397	416	397
q17	972	666	664	664
q18	8354	7808	7633	7633
q19	3644	1551	1553	1551
q20	631	324	325	324
q21	5286	4145	4096	4096
q22	371	286	289	286
Total cold run time: 116422 ms
Total hot run time: 41713 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4520	4381	4394	4381
q2	369	270	269	269
q3	3174	2958	2855	2855
q4	1892	1560	1633	1560
q5	5493	5481	5526	5481
q6	214	123	131	123
q7	2316	1943	1990	1943
q8	3238	3379	3405	3379
q9	8703	8747	8580	8580
q10	3949	3861	3830	3830
q11	572	508	496	496
q12	770	616	674	616
q13	17003	3141	3124	3124
q14	290	256	252	252
q15	517	469	477	469
q16	458	408	423	408
q17	1733	1509	1453	1453
q18	7656	7577	7601	7577
q19	1662	1478	1574	1478
q20	1970	1769	1738	1738
q21	8415	4785	4916	4785
q22	574	492	479	479
Total cold run time: 75488 ms
Total hot run time: 55276 ms

doris-robot avatar May 09 '24 09:05 doris-robot

TPC-DS: Total hot run time: 186422 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a3eae87aaf4dcc8aae68304e8b9f9b1187cae6a4, data reload: false

query1	903	366	356	356
query2	6448	2337	2229	2229
query3	6644	210	215	210
query4	23429	21218	21342	21218
query5	4115	414	417	414
query6	273	192	172	172
query7	4579	297	287	287
query8	240	193	191	191
query9	8742	2417	2393	2393
query10	427	251	254	251
query11	14766	14165	14134	14134
query12	141	91	91	91
query13	1647	386	369	369
query14	9787	6771	8484	6771
query15	216	178	163	163
query16	7932	271	268	268
query17	1697	577	556	556
query18	2037	284	274	274
query19	211	151	153	151
query20	94	90	86	86
query21	205	130	137	130
query22	5064	4841	4868	4841
query23	34163	33747	33710	33710
query24	6750	2868	2916	2868
query25	501	378	380	378
query26	701	163	151	151
query27	1882	321	327	321
query28	3748	2048	2045	2045
query29	844	631	617	617
query30	225	165	157	157
query31	955	748	739	739
query32	74	54	56	54
query33	497	263	260	260
query34	860	484	483	483
query35	767	690	682	682
query36	1054	917	911	911
query37	102	66	72	66
query38	2895	2795	2764	2764
query39	1629	1600	1570	1570
query40	204	129	124	124
query41	46	43	46	43
query42	112	96	96	96
query43	554	545	549	545
query44	1048	719	722	719
query45	279	271	268	268
query46	1064	720	735	720
query47	1974	1869	1869	1869
query48	379	294	374	294
query49	758	385	400	385
query50	767	388	395	388
query51	6834	6850	6840	6840
query52	107	89	90	89
query53	346	281	275	275
query54	528	419	434	419
query55	75	73	75	73
query56	236	220	214	214
query57	1219	1152	1174	1152
query58	212	210	193	193
query59	3324	3327	3104	3104
query60	251	231	229	229
query61	93	89	89	89
query62	577	455	457	455
query63	311	285	279	279
query64	7786	7415	7339	7339
query65	3123	3093	3085	3085
query66	776	331	331	331
query67	15707	14958	14847	14847
query68	8959	546	552	546
query69	595	350	312	312
query70	1368	1075	1057	1057
query71	496	270	274	270
query72	8451	2558	2336	2336
query73	1586	333	336	333
query74	6618	6099	6114	6099
query75	4683	2593	2607	2593
query76	4870	986	960	960
query77	741	261	263	261
query78	11036	10308	10189	10189
query79	7012	526	515	515
query80	993	437	464	437
query81	466	221	221	221
query82	240	98	97	97
query83	196	165	170	165
query84	269	89	92	89
query85	765	268	255	255
query86	341	283	317	283
query87	3271	3130	3153	3130
query88	4782	2430	2418	2418
query89	514	380	390	380
query90	2103	189	191	189
query91	128	96	95	95
query92	57	48	50	48
query93	5641	514	502	502
query94	989	181	181	181
query95	389	308	302	302
query96	605	273	273	273
query97	3165	2977	2956	2956
query98	229	218	218	218
query99	1118	917	924	917
Total cold run time: 288610 ms
Total hot run time: 186422 ms

doris-robot avatar May 09 '24 09:05 doris-robot

LGTM

gavinchou avatar May 09 '24 17:05 gavinchou

run buildall

ByteYue avatar May 10 '24 07:05 ByteYue

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 10 '24 07:05 github-actions[bot]

run buildall

ByteYue avatar May 10 '24 07:05 ByteYue

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 10 '24 07:05 github-actions[bot]

TeamCity be ut coverage result: Function Coverage: 35.68% (8984/25182) Line Coverage: 27.32% (74196/271597) Region Coverage: 26.55% (38345/144422) Branch Coverage: 23.37% (19551/83670) Coverage Report: http://coverage.selectdb-in.cc/coverage/8fa77dcbc8c6bff205e6670bd20322345297cbd7_8fa77dcbc8c6bff205e6670bd20322345297cbd7/report/index.html

doris-robot avatar May 10 '24 08:05 doris-robot

run buildall

ByteYue avatar May 10 '24 11:05 ByteYue

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 10 '24 11:05 github-actions[bot]

TeamCity be ut coverage result: Function Coverage: 35.69% (8987/25182) Line Coverage: 27.34% (74276/271649) Region Coverage: 26.57% (38383/144472) Branch Coverage: 23.39% (19574/83688) Coverage Report: http://coverage.selectdb-in.cc/coverage/aa101c5377be45cbf7b16168257dab5021beb3a3_aa101c5377be45cbf7b16168257dab5021beb3a3/report/index.html

doris-robot avatar May 10 '24 12:05 doris-robot

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar May 11 '24 04:05 github-actions[bot]