doris icon indicating copy to clipboard operation
doris copied to clipboard

[improvement](memory) Storage page cache use LRU-K cache, K=2

Open xinyiZzz opened this issue 1 year ago • 10 comments

Proposed changes

Storage page cache uses plain LRU Cache, occasional batch operations can cause "cache pollution" in plain LRU Cache. This will cause hotspot data to be squeezed out of the cache by non-hotspot data, reduce cache hit rate.

In extreme cases, if the number of pages inserted each time is greater than the cache capacity, the cache hit rate will be 0.

Introducing LRU-K Cache avoids "cache pollution" in most cases.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

xinyiZzz avatar Dec 21 '23 08:12 xinyiZzz

run buildall

xinyiZzz avatar Dec 21 '23 08:12 xinyiZzz

TeamCity be ut coverage result: Function Coverage: 36.47% (8537/23407) Line Coverage: 28.60% (69450/242820) Region Coverage: 27.63% (35945/130105) Branch Coverage: 24.38% (18380/75400) Coverage Report: http://coverage.selectdb-in.cc/coverage/bfd3a073833fc389f528039ed53c428e901d355c_bfd3a073833fc389f528039ed53c428e901d355c/report/index.html

doris-robot avatar Dec 21 '23 08:12 doris-robot

(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 42.51 seconds stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s storage size: 17183496267 Bytes

doris-robot avatar Dec 21 '23 09:12 doris-robot

run buildall

xinyiZzz avatar Dec 21 '23 09:12 xinyiZzz

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit bfd3a073833fc389f528039ed53c428e901d355c, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4582	3856	3820	3820
q2	316	280	159	159
q3	1300	1385	1004	1004
q4	826	944	534	534
q5	2677	2634	2802	2634
q6	202	163	130	130
q7	892	636	484	484
q8	1592	1765	1725	1725
q9	5912	5804	5806	5804
q10	2924	3079	2597	2597
q11	360	209	194	194
q12	331	326	208	208
q13	4523	4505	3824	3824
q14	243	238	215	215
q15	599	520	524	520
q16	438	435	393	393
q17	644	959	303	303
q18	7118	6723	6731	6723
q19	992	1196	1092	1092
q20	464	447	318	318
q21	2979	1981	2082	1981
q22	377	331	288	288
Total cold run time: 40291 ms
Total hot run time: 34950 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4166	4165	4141	4141
q2	249	270	177	177
q3	3261	3300	2971	2971
q4	2133	2213	1789	1789
q5	5321	5330	5293	5293
q6	232	152	124	124
q7	2223	2004	1845	1845
q8	3084	3076	2963	2963
q9	8223	8160	8138	8138
q10	3750	3838	3371	3371
q11	563	404	367	367
q12	760	756	608	608
q13	4253	4269	3558	3558
q14	287	290	261	261
q15	597	523	526	523
q16	513	507	484	484
q17	1700	1753	1569	1569
q18	8296	7945	7866	7866
q19	1210	1203	1207	1203
q20	2221	2184	1942	1942
q21	6069	5178	5299	5178
q22	526	464	412	412
Total cold run time: 59637 ms
Total hot run time: 54783 ms

doris-robot avatar Dec 21 '23 10:12 doris-robot

(From new machine)TeamCity pipeline, clickbench performance test result: the sum of best hot time: 42.25 seconds stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s storage size: 17184186377 Bytes

doris-robot avatar Dec 21 '23 11:12 doris-robot

TeamCity be ut coverage result: Function Coverage: 36.47% (8536/23406) Line Coverage: 28.60% (69443/242848) Region Coverage: 27.63% (35954/130124) Branch Coverage: 24.37% (18382/75418) Coverage Report: http://coverage.selectdb-in.cc/coverage/32ebcb7ab0e1acf07d88a9eee3698de7e738da0d_32ebcb7ab0e1acf07d88a9eee3698de7e738da0d/report/index.html

doris-robot avatar Dec 21 '23 11:12 doris-robot

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 32ebcb7ab0e1acf07d88a9eee3698de7e738da0d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4568	3805	3864	3805
q2	326	270	158	158
q3	1314	1400	1028	1028
q4	817	926	521	521
q5	2650	2686	2749	2686
q6	201	157	129	129
q7	887	646	495	495
q8	1597	1725	1696	1696
q9	5904	5769	5754	5754
q10	2904	3097	2598	2598
q11	354	207	204	204
q12	337	338	207	207
q13	4557	4512	3834	3834
q14	242	237	214	214
q15	602	533	527	527
q16	442	431	387	387
q17	653	923	305	305
q18	7211	6801	6766	6766
q19	1003	1171	1120	1120
q20	498	456	298	298
q21	2946	1990	2081	1990
q22	375	329	288	288
Total cold run time: 40388 ms
Total hot run time: 35010 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4159	4153	4202	4153
q2	249	272	173	173
q3	3293	3307	2970	2970
q4	2135	2215	1792	1792
q5	5346	5302	5317	5302
q6	234	153	122	122
q7	2267	1997	1867	1867
q8	3083	3052	3000	3000
q9	8208	8159	8178	8159
q10	3735	3849	3374	3374
q11	554	390	379	379
q12	765	766	598	598
q13	4273	4299	3580	3580
q14	287	291	275	275
q15	617	521	524	521
q16	527	509	487	487
q17	1724	1765	1639	1639
q18	8322	7778	7915	7778
q19	1214	1202	1192	1192
q20	2223	2186	1944	1944
q21	6110	5213	5334	5213
q22	530	469	426	426
Total cold run time: 59855 ms
Total hot run time: 54944 ms

doris-robot avatar Dec 21 '23 11:12 doris-robot

run buildall

xinyiZzz avatar Feb 23 '24 09:02 xinyiZzz

TeamCity be ut coverage result: Function Coverage: 35.72% (8549/23936) Line Coverage: 27.55% (69419/251951) Region Coverage: 26.71% (36016/134827) Branch Coverage: 23.52% (18419/78298) Coverage Report: http://coverage.selectdb-in.cc/coverage/f73e9c404bcb1213848650d993e0b76626675ca1_f73e9c404bcb1213848650d993e0b76626675ca1/report/index.html

doris-robot avatar Feb 23 '24 09:02 doris-robot

run buildall

xinyiZzz avatar Feb 26 '24 05:02 xinyiZzz

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 26 '24 05:02 github-actions[bot]

run buildall

xinyiZzz avatar Feb 26 '24 06:02 xinyiZzz

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 26 '24 06:02 github-actions[bot]

TeamCity be ut coverage result: Function Coverage: 35.73% (8548/23924) Line Coverage: 27.54% (69399/251976) Region Coverage: 26.70% (36008/134848) Branch Coverage: 23.51% (18409/78314) Coverage Report: http://coverage.selectdb-in.cc/coverage/c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10_c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10/report/index.html

doris-robot avatar Feb 26 '24 06:02 doris-robot

TPC-H: Total hot run time: 35500 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10, data reload: false

------ Round 1 ----------------------------------
q1	17687	4513	4362	4362
q2	2040	244	133	133
q3	10495	1442	1003	1003
q4	4669	1216	1009	1009
q5	7610	2782	2603	2603
q6	185	157	135	135
q7	1037	967	765	765
q8	9198	1469	1101	1101
q9	6440	5352	5302	5302
q10	8170	3074	2644	2644
q11	419	243	228	228
q12	688	471	354	354
q13	17935	4435	3658	3658
q14	291	287	268	268
q15	685	519	508	508
q16	456	458	416	416
q17	453	618	479	479
q18	7033	6315	6429	6315
q19	1289	671	600	600
q20	414	433	289	289
q21	6080	2992	3156	2992
q22	385	377	336	336
Total cold run time: 103659 ms
Total hot run time: 35500 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5055	5015	4974	4974
q2	269	310	193	193
q3	3261	3397	2901	2901
q4	2252	2369	1726	1726
q5	5317	5115	5108	5108
q6	215	165	128	128
q7	2136	1848	1721	1721
q8	2669	2367	2380	2367
q9	7837	7779	7751	7751
q10	3892	4050	3573	3573
q11	642	474	413	413
q12	766	786	565	565
q13	3820	4233	3443	3443
q14	260	268	241	241
q15	662	515	500	500
q16	459	480	465	465
q17	1599	1601	1557	1557
q18	8175	7410	7355	7355
q19	764	731	733	731
q20	2048	2052	1867	1867
q21	6074	5328	5360	5328
q22	599	574	493	493
Total cold run time: 58771 ms
Total hot run time: 53400 ms

doris-robot avatar Feb 26 '24 06:02 doris-robot

TPC-DS: Total hot run time: 175616 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10, data reload: false

query1	924	349	337	337
query2	6539	1905	1905	1905
query3	6708	205	202	202
query4	23316	21504	21222	21222
query5	4232	502	375	375
query6	257	184	173	173
query7	4608	486	311	311
query8	253	205	204	204
query9	8447	2815	2797	2797
query10	404	278	219	219
query11	15007	14906	14469	14469
query12	146	85	87	85
query13	1713	544	421	421
query14	9650	7724	7851	7724
query15	212	221	189	189
query16	7495	413	256	256
query17	1349	673	527	527
query18	1954	345	269	269
query19	181	176	144	144
query20	85	84	88	84
query21	188	150	118	118
query22	4824	4809	4653	4653
query23	33007	31485	31496	31485
query24	8108	2746	2802	2746
query25	437	418	349	349
query26	1206	271	165	165
query27	2719	428	314	314
query28	4355	1822	1804	1804
query29	876	775	643	643
query30	207	166	140	140
query31	919	858	765	765
query32	90	63	61	61
query33	473	302	231	231
query34	775	817	493	493
query35	902	890	835	835
query36	960	912	913	912
query37	89	91	62	62
query38	3230	3272	3225	3225
query39	1405	1337	1339	1337
query40	205	123	109	109
query41	38	35	34	34
query42	103	104	98	98
query43	506	481	464	464
query44	1191	697	712	697
query45	193	190	182	182
query46	801	985	622	622
query47	1606	1626	1550	1550
query48	406	427	348	348
query49	650	398	314	314
query50	603	630	377	377
query51	4444	4339	4299	4299
query52	108	98	95	95
query53	378	401	308	308
query54	246	247	226	226
query55	84	88	79	79
query56	218	239	202	202
query57	1012	1007	925	925
query58	219	210	206	206
query59	2365	2477	2346	2346
query60	228	234	219	219
query61	85	87	84	84
query62	521	434	378	378
query63	363	295	298	295
query64	5114	2971	2491	2491
query65	3330	3248	3251	3248
query66	991	426	329	329
query67	14592	14401	14178	14178
query68	2266	925	500	500
query69	492	383	353	353
query70	1249	1207	1178	1178
query71	305	276	253	253
query72	5082	2887	2764	2764
query73	497	603	312	312
query74	6847	6590	6363	6363
query75	3034	2975	2564	2564
query76	2193	1037	658	658
query77	281	319	232	232
query78	9252	9345	8777	8777
query79	895	864	502	502
query80	530	475	357	357
query81	404	247	207	207
query82	917	119	87	87
query83	242	144	120	120
query84	227	97	79	79
query85	800	419	349	349
query86	325	322	317	317
query87	3374	3426	3241	3241
query88	2890	2296	2296	2296
query89	469	417	367	367
query90	1687	169	167	167
query91	143	158	126	126
query92	54	56	49	49
query93	864	943	497	497
query94	496	308	183	183
query95	421	357	342	342
query96	457	578	262	262
query97	4394	4464	4250	4250
query98	224	208	197	197
query99	842	813	713	713
Total cold run time: 251003 ms
Total hot run time: 175616 ms

doris-robot avatar Feb 26 '24 06:02 doris-robot

ClickBench: Total hot run time: 28.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10, data reload: false

query1	0.02	0.03	0.02
query2	0.06	0.02	0.03
query3	0.23	0.07	0.07
query4	1.65	0.08	0.08
query5	0.48	0.48	0.48
query6	1.33	0.61	0.62
query7	0.01	0.01	0.01
query8	0.04	0.02	0.02
query9	0.51	0.47	0.46
query10	0.49	0.49	0.49
query11	0.13	0.10	0.10
query12	0.12	0.10	0.10
query13	0.58	0.59	0.59
query14	0.76	0.78	0.77
query15	0.83	0.80	0.79
query16	0.33	0.33	0.34
query17	0.96	0.90	0.89
query18	0.17	0.18	0.18
query19	1.80	1.62	1.65
query20	0.01	0.01	0.02
query21	15.43	1.08	0.58
query22	0.70	0.88	0.81
query23	14.92	1.53	0.58
query24	2.19	0.74	0.29
query25	0.67	0.08	0.06
query26	0.15	0.14	0.14
query27	0.05	0.06	0.05
query28	13.03	1.48	0.80
query29	12.57	3.95	3.28
query30	0.52	0.49	0.45
query31	2.77	0.57	0.36
query32	3.21	0.55	0.48
query33	3.12	3.13	3.16
query34	14.84	5.08	4.46
query35	4.52	4.51	4.49
query36	1.05	0.99	0.95
query37	0.07	0.05	0.05
query38	0.04	0.03	0.02
query39	0.02	0.02	0.02
query40	0.18	0.15	0.15
query41	0.07	0.01	0.02
query42	0.02	0.02	0.01
query43	0.02	0.03	0.02
Total cold run time: 100.67 s
Total hot run time: 28.83 s

doris-robot avatar Feb 26 '24 06:02 doris-robot

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c0afcf21c73fb5bb7d3f240e1e1b3e2916ef6d10 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          61 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       15.1 seconds inserted 10000000 Rows, about 662K ops/s

doris-robot avatar Feb 26 '24 06:02 doris-robot