redis an incomplete POC for dict's evolution to test the performance impact

In order to evaluate this pr A perfect implementation of a hashtable with a very large number of keys program, a test was done and the following is the core evaluation data.

Maximum table capacity 65536, Linear access of existing elements

benchmark number	1000000	3000000	5000000	8000000	15000000	30000000
dictFind	454/452/457	1353/1363/1359	2282/2262/2256	4089/4083/4039	8078/8076/8086	17212/17005/17049
dictXFind	450/456/451	1392/1378/1377	2267/2307/2280	4130/4146/4121	8457/8120/8110	17865/17358/17449
Performance loss	+0.0044	-0.0176	-0.0079	-0.0152	-0.0184	-0.0274

Maximum table capacity 65536, Random access of existing elements

benchmark number	1000000	3000000	5000000	8000000	15000000	30000000
dictFind	662/661/681	2168/2123/2119	3566/3564/3549	6107/6213/6150	13461/13309/12726	29135/28809/28939
dictXFind	675/673/673	2150/2135/2140	3598/3584/3587	6212/6213/6290	14591/12684/12613	29466/29057/29255
Performance loss	-0.0084	-0.0023	-0.0084	-0.0132	-0.0099	-0.0103

Maximum table capacity 65536, Accessing missing

benchmark number	1000000	3000000	5000000	8000000	15000000	30000000
dictFind	529/527/529	1684/1674/1683	2639/2610/2641	4928/4970/4960	10303/10174/10131	21668/21588/21826
dictXFind	536/536/530	1686/1681/1666	2691/2697/2732	5098/5044/5015	11410/10066/10051	22067/22000/22346
Performance loss	-0.0107	+0.0015	-0.0291	-0.0201	-0.0209	-0.0204

Sep 30 '21 06:09 GuoZhaoran

Benchmark results look good, I think a 2% diff isn't that significant. At least this looks good enough to move forward. Can you by any chance add some rehashing benchmarks?

Oct 02 '21 08:10 yoav-steinberg

Benchmark results look good, I think a 2% diff isn't that significant. At least this looks good enough to move forward. Can you by any chance add some rehashing benchmarks?

Okay, thank you very much for your careful review and suggestions, I'll summarize what I'm going to do next.

add the logic of rehash and get some benchmark data about rehash
correct and optimize the problems you mentioned above
dict and dictX are one structure, let them parallel here for test coding convenience, maybe I should make them merge and make all the methods simple (this is what I try to do next)

Do you have any other suggestions?

Oct 03 '21 10:10 GuoZhaoran

in this test I made them parallel to give you guys a lot of confusion, because I don't want to pollute the rest of dict.c for now, so I made them parallel.

@GuoZhaoran The parallel structure is confusing, indeed. :grin: It's better to just change dict so we can have a better idea of how simple the code will be. With the parallel structure, it's too complex. Don't worry about "polluting the rest of dict". We have the original dict in the other branches and you can use git diff to compare.

For a more realistic test, you can try redis-benchmark -t set,get (maybe with threads 2 and pipeline 10, otherwise redis-benchmark may take more CPU than redis itself, if you run then on the same machine).

Oct 04 '21 09:10 zuiderkwast

It's better to just change dict so we can have a better idea of how simple the code will be.

Thanks for the suggestion, I'll fuse the two structures next time and try to make minimal changes.

Oct 04 '21 10:10 GuoZhaoran

For a more realistic test, you can try redis-benchmark -t set,get (maybe with threads 2 and pipeline 10, otherwise redis-benchmark may take more CPU than redis itself, if you run then on the same machine).

This test protocol is very useful and more scientific, thanks to your help. It looks like there is some testing to be done along with the coding.

Oct 04 '21 10:10 GuoZhaoran

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

:x: 15738859212
:x: GuoZhaoran
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Mar 24 '24 23:03 CLAassistant

redis redis copied to clipboard

an incomplete POC for dict's evolution to test the performance impact

redis
redis copied to clipboard