robin-hood-hashing icon indicating copy to clipboard operation
robin-hood-hashing copied to clipboard

Idea for robin-hood-hashing V2

Open martinus opened this issue 2 years ago • 1 comments

  • Performance is important but not everything

  • Use an std::vector<std::pair<Key, Value>> (maybe customizable) for the content that is always 100% compact. iteration is perfectly fast.

  • Use an indexing structure into that vector, that cannot overflow, e.g. like so:

    • 3 byte offset from original bucket => 2^24-1 = 16777215 collisions possible, should be plenty
    • 1 byte hash
    • 4 byte index into std::vector
  • => limitations are: no more than 16M collisions (so no overflow check necessary?)

  • 4 byte index, so no more than 4 billion elements are allowed. Should be enough for most use cases.

    • 32bit hash would be enough for indexing. Convert mumx to 32bit would work
    • We could use non-power-of-two-hashmap sizes. Given a well distributed 32bit hash, use idx = ((uint64_t)hash * (uint64_t)mapsize) >> 32
  • 1 byte hash: only every 256th key needs to be checked, that's good enough

  • ordering like robin-hood-hash, we we only need a single < comparison

  • how to deal with capacity() changes / rehash? (resize to capacity + 1, then resize to the new capacity(). This sucks.)

martinus avatar Apr 26 '22 13:04 martinus

I have a good implemention of spare hash map https://github.com/ktprime/emhash/blob/master/hash_table8.hpp it's the fastest iteration compared with other's hash map, but it's not efficient for erasion/insertion if Key/Value is large. it need 8 bytes index for each key/value pair.

bench_IterateIntegers map = emhash5 total iterate/removing time = 5.44, 11.98|1127848008

bench_IterateIntegers map = emhash8 total iterate/removing time = 0.39, 1.17

bench_IterateIntegers map = emhash7 total iterate/removing time = 1.00, 2.78

bench_IterateIntegers map = emhash6 total iterate/removing time = 1.08, 2.92

bench_IterateIntegers map = rigtorp total iterate/removing time = 5.38, 13.14

bench_IterateIntegers map = martinus flat total iterate/removing time = 3.02, 9.06

bench_IterateIntegers map = phmap flat total iterate/removing time = 3.38, 8.23

bench_IterateIntegers map = emilib3 total iterate/removing time = 1.69, 4.48

bench_IterateIntegers map = emilib2 total iterate/removing time = 1.62, 4.34

bench_IterateIntegers map = absl flat total iterate/removing time = 3.27, 8.28

ktprime avatar May 23 '22 05:05 ktprime