hashtable-benchmarks
hashtable-benchmarks copied to clipboard
support non-deterministic rehash
Currently the benchmark assumes that the rehash point of a given implementation is only determined by the size(), so all of the sets constructed by GenerateSets
will have the same size. This is not true for the hash table implementations at https://github.com/skarupke/flat_hash_map, which should be included. Functions that use Transpose
are the ones that will take a bit of work to fix.
It will give you sets that are 1 element from growing, so it is mostly correct... I am not entirely sure how to account for tables that decide when to grow based on data layout and not size. Since we are feeding it random bytes, we should achieve near optimal spreading of data across the table. I think that would put it close to the tables theoretic max load factor
That's also what I would expect, but I see it with both ska::bytell_hash_set and ska::flat_hash_set when trying to run the benchmark with them.
What do you mean by "see it"? Do you mean "observe significant instability in results" or just "observe that it can change sizes slightly"?
I observe that in dbg builds the assert(res.front().size() == res.back().size());
in GenerateSets fails. In opt builds the benchmark crashes.
Oh, yeah I understand now. We might just need to make it slightly more tolerant to this sort of thing...
I think it will be reasonable to just use the minimum size during Transpose, which isn't much work (I've already done it).
We still need to fix while (env->KeepRunningBatch(s.size() * s.front().size()
- kOpsPerKey)) to count properly.
Another way of fixing this is to use Shuffle instead of Transpose. Instead of returning keys we can return a vector<pair<uint32, uint32>> where v.first is the set index and v.second is a random key in that set. We can create this set by accumulating all pairs and shuffling them. It won't be as deterministic as the current approach but because we use a lot of sets and a lot of keys it should work well enough.
On Fri, Oct 5, 2018 at 9:08 PM Nathan Grasso Bronson < [email protected]> wrote:
I think it will be reasonable to just use the minimum size during Transpose, which isn't much work (I've already done it).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/hashtable-benchmarks/issues/11#issuecomment-427452387, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQO_Q4FUAPSRaR1GPT_5O7304-mQpbFks5uh6ASgaJpZM4XKhWp .
Another issue is that the early-rehash behavior is dependent on key insertion order, so changes to GenerateSet will also be required for Density::kMax. Some ways to fix it would be to reset the RNG and replay the same sequence of keys when trying to produce the almost-at-rehash map, or store the keys in a vector to allow them to be replayed.
Whatever we do needs to ensure that the InsertMany* tests cover the whole range of load factors for implementations that rehash at different points.