feat: add open addressing hash set
https://github.com/dragonflydb/dragonfly/issues/4915
Added main functionality of OAHSet class added tests
This hash table is a classic open addressing hash table with linear probing, but with the next modification:
- Every entry contains additional hash data for avoiding collisions;
- Resize is done when the number of elements == size of buckets;
- If there is no place for a new entry, it is placed in the extension point vector (stored only in every Nth element);
- Search for element is linear in the next N entries and in the extension point vector.
Current N = 4, it would be better to increase it to 16 or even 64, but doing so would require implementing quite a lot of other optimizations to prevent performance degradation. Bigger N improves memory locality and reduces memory consumption, significantly improving Grow and serialization operations performance, but reducing the performance of ADD type operations.
For now, benchmarks are the next, but the code wasn't optimized for performance yet
| Benchmark | CPU OAHSet | CPU StringSet |
|---|---|---|
| BM_Clone/elements:32000 | 820490 ns | 1431076 ns |
| BM_Fill/elements:32000 | 898017 ns | 1012854 ns |
| BM_Clear/elements:32000 | 378877 ns | 498369 ns |
| BM_Add/elements:1000/Key Size:10 | 16816 ns | 26929 ns |
| BM_Add/elements:10000/Key Size:10 | 202441 ns | 274483 ns |
| BM_Add/elements:100000/Key Size:10 | 2727592 ns | 3769934 ns |
| BM_Add/elements:1000/Key Size:100 | 20066 ns | 31039 ns |
| BM_Add/elements:10000/Key Size:100 | 262668 ns | 356733 ns |
| BM_Add/elements:100000/Key Size:100 | 3516319 ns | 5368266 ns |
| BM_Add/elements:1000/Key Size:1000 | 66824 ns | 78710 ns |
| BM_Add/elements:10000/Key Size:1000 | 814923 ns | 962557 ns |
| BM_Add/elements:100000/Key Size:1000 | 13560669 ns | 17504345 ns |
| BM_AddMany/elements:1000/Key Size:10 | 18485 ns | 26901 ns |
| BM_AddMany/elements:10000/Key Size:10 | 225058 ns | 272835 ns |
| BM_AddMany/elements:100000/Key Size:10 | 2891681 ns | 3350356 ns |
| BM_AddMany/elements:1000/Key Size:100 | 22909 ns | 31794 ns |
| BM_AddMany/elements:10000/Key Size:100 | 291694 ns | 351925 ns |
| BM_AddMany/elements:100000/Key Size:100 | 3698692 ns | 5018262 ns |
| BM_AddMany/elements:1000/Key Size:1000 | 67786 ns | 82353 ns |
| BM_AddMany/elements:10000/Key Size:1000 | 919993 ns | 1175522 ns |
| BM_AddMany/elements:100000/Key Size:1000 | 16067585 ns | 18360927 ns |
| BM_Erase/elements:1000/Key Size:10 | 17319 ns | 23159 ns |
| BM_Erase/elements:10000/Key Size:10 | 250941 ns | 246216 ns |
| BM_Erase/elements:100000/Key Size:10 | 3947218 ns | 3158983 ns |
| BM_Erase/elements:1000/Key Size:100 | 22722 ns | 26417 ns |
| BM_Erase/elements:10000/Key Size:100 | 350034 ns | 309551 ns |
| BM_Erase/elements:100000/Key Size:100 | 9708831 ns | 4709865 ns |
| BM_Erase/elements:1000/Key Size:1000 | 80075 ns | 72539 ns |
| BM_Erase/elements:10000/Key Size:1000 | 1406052 ns | 874350 ns |
| BM_Erase/elements:100000/Key Size:1000 | 18447727 ns | 16848360 ns |
| BM_Get/elements:1000/Key Size:10 | 6589 ns | 9790 ns |
| BM_Get/elements:10000/Key Size:10 | 62842 ns | 90164 ns |
| BM_Get/elements:100000/Key Size:10 | 1669508 ns | 2203686 ns |
| BM_Get/elements:1000/Key Size:100 | 12577 ns | 14200 ns |
| BM_Get/elements:10000/Key Size:100 | 130239 ns | 196186 ns |
| BM_Get/elements:100000/Key Size:100 | 2892933 ns | 3901384 ns |
| BM_Get/elements:1000/Key Size:1000 | 59639 ns | 59436 ns |
| BM_Get/elements:10000/Key Size:1000 | 741433 ns | 779920 ns |
| BM_Get/elements:100000/Key Size:1000 | 13289613 ns | 20661595 ns |
| BM_Grow | 7841012 ns | 8637650 ns |
how memory usage is compared? say you add K 8-byte members to both implementations for K=1000000,1500000,2000000,2500000,3000000?
On average i see 10-15% improvement in speed,right?
On average i see 10-15% improvement in speed,right?
For now yes, but we can still do a lot of optimizations. Also we can tune it to significantly improve some operations (reducing some others or insignificantly increasing memory consumption), for example, Grow can be two times faster if we increase N or increase memory consumption.
How is memory usage compared? Say you add K 8-byte members to both implementations for K=1000000,1500000,2000000,2500000,3000000?
I don't know if I got the question. I didn't measure memory consumption. Theoretically, if we compare with StringSet, the current implementation is quite similar in memory consumption (I expect 1 byte better per entry, but I didn't check).
I've run benchmarks on AWS c7g.16xlarge (arm64), and the results are even better than on my local CPU
| Benchmark | CPU OAHSet | CPU StringSet |
|---|---|---|
| BM_Clone/elements:32000 | 1330884 ns | 2685591 ns |
| BM_Fill/elements:32000 | 1638128 ns | 2026346 ns |
| BM_Clear/elements:32000 | 611710 ns | 1152499 ns |
| BM_Add/elements:1000/Key Size:10 | 35535 ns | 61020 ns |
| BM_Add/elements:10000/Key Size:10 | 370244 ns | 543129 ns |
| BM_Add/elements:100000/Key Size:10 | 4995692 ns | 7673177 ns |
| BM_Add/elements:1000/Key Size:100 | 54810 ns | 78149 ns |
| BM_Add/elements:10000/Key Size:100 | 537805 ns | 745326 ns |
| BM_Add/elements:100000/Key Size:100 | 6709688 ns | 10060040 ns |
| BM_Add/elements:1000/Key Size:1000 | 130500 ns | 168247 ns |
| BM_Add/elements:10000/Key Size:1000 | 1313826 ns | 1683678 ns |
| BM_Add/elements:100000/Key Size:1000 | 18742027 ns | 26859559 ns |
| BM_AddMany/elements:1000/Key Size:10 | 46076 ns | 59239 ns |
| BM_AddMany/elements:10000/Key Size:10 | 451744 ns | 531307 ns |
| BM_AddMany/elements:100000/Key Size:10 | 5792388 ns | 7514034 ns |
| BM_AddMany/elements:1000/Key Size:100 | 62141 ns | 74747 ns |
| BM_AddMany/elements:10000/Key Size:100 | 613032 ns | 721393 ns |
| BM_AddMany/elements:100000/Key Size:100 | 7491017 ns | 9748602 ns |
| BM_AddMany/elements:1000/Key Size:1000 | 135705 ns | 167385 ns |
| BM_AddMany/elements:10000/Key Size:1000 | 1392936 ns | 1615755 ns |
| BM_AddMany/elements:100000/Key Size:1000 | 19511305 ns | 27544565 ns |
| BM_Erase/elements:1000/Key Size:10 | 38065 ns | 47403 ns |
| BM_Erase/elements:10000/Key Size:10 | 594035 ns | 427876 ns |
| BM_Erase/elements:100000/Key Size:10 | 6732755 ns | 6151252 ns |
| BM_Erase/elements:1000/Key Size:100 | 52533 ns | 66609 ns |
| BM_Erase/elements:10000/Key Size:100 | 810278 ns | 643084 ns |
| BM_Erase/elements:100000/Key Size:100 | 9370130 ns | 8724927 ns |
| BM_Erase/elements:1000/Key Size:1000 | 147916 ns | 149872 ns |
| BM_Erase/elements:10000/Key Size:1000 | 1532298 ns | 1496420 ns |
| BM_Erase/elements:100000/Key Size:1000 | 19927403 ns | 19269993 ns |
| BM_Get/elements:1000/Key Size:10 | 14795 ns | 26821 ns |
| BM_Get/elements:10000/Key Size:10 | 183910 ns | 256741 ns |
| BM_Get/elements:100000/Key Size:10 | 3065545 ns | 4401379 ns |
| BM_Get/elements:1000/Key Size:100 | 35506 ns | 47468 ns |
| BM_Get/elements:10000/Key Size:100 | 381048 ns | 498173 ns |
| BM_Get/elements:100000/Key Size:100 | 5584236 ns | 7877216 ns |
| BM_Get/elements:1000/Key Size:1000 | 103973 ns | 119229 ns |
| BM_Get/elements:10000/Key Size:1000 | 1045459 ns | 1276052 ns |
| BM_Get/elements:100000/Key Size:1000 | 12887455 ns | 19505085 ns |
| BM_Grow | 10429151 ns | 14235076 ns |
How memory usage looks like?, btw, can you provide a third column for benchmarks showing improvement in percents as it's hard to see by eye which one improved and by how much.
How memory usage looks like?, btw, can you provide a third column for benchmarks showing improvement in percents as it's hard to see by eye which one improved and by how much.
I didn't compare memory usage, but it should be less than or equal to before (I assumed 3 bits less per entry). Regarding percents, there is no sense in them until we discuss what is our main priority in hash table. For now, I have implemented it to be faster across all operations, but it could be tuned to consume 1-2 bytes less memory per entry than StringSet, or we can improve Grow operation performance by 2x or more. So there is no sense in measurement until the way is seen.