hashtable-benchmarks support non-deterministic rehash

Currently the benchmark assumes that the rehash point of a given implementation is only determined by the size(), so all of the sets constructed by GenerateSets will have the same size. This is not true for the hash table implementations at https://github.com/skarupke/flat_hash_map, which should be included. Functions that use Transpose are the ones that will take a bit of work to fix.

Oct 05 '18 16:10 nbronson

It will give you sets that are 1 element from growing, so it is mostly correct... I am not entirely sure how to account for tables that decide when to grow based on data layout and not size. Since we are feeding it random bytes, we should achieve near optimal spreading of data across the table. I think that would put it close to the tables theoretic max load factor

Oct 05 '18 16:10 fowles

That's also what I would expect, but I see it with both ska::bytell_hash_set and ska::flat_hash_set when trying to run the benchmark with them.

Oct 05 '18 16:10 nbronson

What do you mean by "see it"? Do you mean "observe significant instability in results" or just "observe that it can change sizes slightly"?

Oct 05 '18 16:10 fowles

I observe that in dbg builds the assert(res.front().size() == res.back().size()); in GenerateSets fails. In opt builds the benchmark crashes.

Oct 05 '18 16:10 nbronson

Oh, yeah I understand now. We might just need to make it slightly more tolerant to this sort of thing...

Oct 05 '18 18:10 fowles

I think it will be reasonable to just use the minimum size during Transpose, which isn't much work (I've already done it).

Oct 05 '18 18:10 nbronson

We still need to fix while (env->KeepRunningBatch(s.size() * s.front().size()

kOpsPerKey)) to count properly.

Another way of fixing this is to use Shuffle instead of Transpose. Instead of returning keys we can return a vector<pair<uint32, uint32>> where v.first is the set index and v.second is a random key in that set. We can create this set by accumulating all pairs and shuffling them. It won't be as deterministic as the current approach but because we use a lot of sets and a lot of keys it should work well enough.

On Fri, Oct 5, 2018 at 9:08 PM Nathan Grasso Bronson < [email protected]> wrote:

I think it will be reasonable to just use the minimum size during Transpose, which isn't much work (I've already done it).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/hashtable-benchmarks/issues/11#issuecomment-427452387, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQO_Q4FUAPSRaR1GPT_5O7304-mQpbFks5uh6ASgaJpZM4XKhWp .

Oct 06 '18 21:10 alkis

Another issue is that the early-rehash behavior is dependent on key insertion order, so changes to GenerateSet will also be required for Density::kMax. Some ways to fix it would be to reset the RNG and replay the same sequence of keys when trying to produce the almost-at-rehash map, or store the keys in a vector to allow them to be replayed.

Whatever we do needs to ensure that the InsertMany* tests cover the whole range of load factors for implementations that rehash at different points.

Oct 07 '18 16:10 nbronson

hashtable-benchmarks hashtable-benchmarks copied to clipboard

support non-deterministic rehash

hashtable-benchmarks
hashtable-benchmarks copied to clipboard