robin-hood-hashing
robin-hood-hashing copied to clipboard
Idea for robin-hood-hashing V2
-
Performance is important but not everything
-
Use an
std::vector<std::pair<Key, Value>>
(maybe customizable) for the content that is always 100% compact. iteration is perfectly fast. -
Use an indexing structure into that vector, that cannot overflow, e.g. like so:
- 3 byte offset from original bucket => 2^24-1 = 16777215 collisions possible, should be plenty
- 1 byte hash
- 4 byte index into std::vector
-
=> limitations are: no more than 16M collisions (so no overflow check necessary?)
-
4 byte index, so no more than 4 billion elements are allowed. Should be enough for most use cases.
- 32bit hash would be enough for indexing. Convert
mumx
to 32bit would work - We could use non-power-of-two-hashmap sizes. Given a well distributed 32bit hash, use
idx = ((uint64_t)hash * (uint64_t)mapsize) >> 32
- 32bit hash would be enough for indexing. Convert
-
1 byte hash: only every 256th key needs to be checked, that's good enough
-
ordering like robin-hood-hash, we we only need a single
<
comparison -
how to deal with capacity() changes / rehash? (resize to capacity + 1, then resize to the new capacity(). This sucks.)
I have a good implemention of spare hash map https://github.com/ktprime/emhash/blob/master/hash_table8.hpp it's the fastest iteration compared with other's hash map, but it's not efficient for erasion/insertion if Key/Value is large. it need 8 bytes index for each key/value pair.
bench_IterateIntegers map = emhash5 total iterate/removing time = 5.44, 11.98|1127848008
bench_IterateIntegers map = emhash8 total iterate/removing time = 0.39, 1.17
bench_IterateIntegers map = emhash7 total iterate/removing time = 1.00, 2.78
bench_IterateIntegers map = emhash6 total iterate/removing time = 1.08, 2.92
bench_IterateIntegers map = rigtorp total iterate/removing time = 5.38, 13.14
bench_IterateIntegers map = martinus flat total iterate/removing time = 3.02, 9.06
bench_IterateIntegers map = phmap flat total iterate/removing time = 3.38, 8.23
bench_IterateIntegers map = emilib3 total iterate/removing time = 1.69, 4.48
bench_IterateIntegers map = emilib2 total iterate/removing time = 1.62, 4.34
bench_IterateIntegers map = absl flat total iterate/removing time = 3.27, 8.28