bustle Add option for non-uniform key distribution

I suggest adding an option to use a Zipf distribution to sample the keys. It might give a more realistic work load for heavy contention applications as many real word datasets follow it.

The rand_distr crate has an implementation that is not yet released here.

Sep 15 '21 19:09 terrorfisch

It's a great suggestion, though I'd say zipf is probably a better choice for benchmarking as it's both stable and very fast. Would be happy to take a look at a PR!

Sep 17 '21 00:09 jonhoo

I'll work on it when I find the time :)

The zipf create requires rand 0.8 which changed the SmallRng and its seed size on 64bit systems. Would you rather change the public seed type in bustle to u64 and use Seedable::seed_from_u64 or stick explicitly to the previous PNRG Mcg128Xsl64?

Sep 17 '21 14:09 terrorfisch

Do I understand it correctly, that mix and the prefill assume that the keys are unique?

Sep 17 '21 15:09 terrorfisch

I'll work on it when I find the time :)

Couldn't ask for anything more :)

The zipf create requires rand 0.8 which changed the SmallRng and its seed size on 64bit systems. Would you rather change the public seed type in bustle to u64 and use Seedable::seed_from_u64 or stick explicitly to the previous PNRG Mcg128Xsl64?

Hmm, that's a good question. I'm okay with changing the seed type if that's what rand has done, though it is vital that the per-thread randomizer remain very fast since we're benchmarking super high-performance systems:

https://github.com/jonhoo/bustle/blob/5109b8de7f30277df6082cb1ec2f5f0e188baf16/src/lib.rs#L428-L429

It could be that zipf randomization is too slow to run on-line, and we end up just benchmarking the random number generation, and that we actually need a pre-phase just to generate random numbers when using zipf.

Do I understand it correctly, that mix and the prefill assume that the keys are unique?

The prefill definitely assumes that:

https://github.com/jonhoo/bustle/blob/5109b8de7f30277df6082cb1ec2f5f0e188baf16/src/lib.rs#L352

From a skim to cache the logic back in, I believe mix also assumes it, since it relies just on the indices into the key list to determine whether a given key should be present or not

https://github.com/jonhoo/bustle/blob/5109b8de7f30277df6082cb1ec2f5f0e188baf16/src/lib.rs#L447

This illustrates a shortcoming of the current benchmark design, which is that each key really operates on its own distinct set of keys — two threads never access the same key. Or if they do, the code would panic :sweat_smile: At least unless I'm missing something on my re-reading of the code.

It's definitely going to be tricky to add Zipf here since you don't have an obvious cheap way to determine whether a key should be present or not. I wonder whether you need a much more sophisticated prefill that doesn't generate the keys and operations separately, but actually generates a full operational log with associated keys for each thread — and it could keep track of what the current state should be and therefore bake in what the valid operations and expected outcomes are. That should work for uniform operation as well.

Sep 19 '21 01:09 jonhoo

The question is if we need to check correctness at all when doing benchmarking. Currently the tests could fail because there is a very small chance of duplicate keys.

It could be that zipf randomization is too slow to run on-line, and we end up just benchmarking the random number generation, and that we actually need a pre-phase just to generate random numbers when using zipf.

I would change generate the keys themselves with zipf. I hacked together a small abstraction over the keys and will create a PR if I find the time.

Sep 22 '21 08:09 terrorfisch

The challenge with Zipf is that the chance of duplicate keys is actually very high :sweat_smile: In reality, these checks aren't there so much to see that the backing implementation, but rather to make sure that we have correctly tracked what state the backing data structure is in. If we don't, then the mix would end up being wrong, and the benchmark numbers unreliable. For example, let's say we didn't error on these cases, and then had a bug in the generator such that all removes ended up being for keys that were never inserted — in that case, most backends would get far better performance, but it's only because we'd be measuring the wrong thing!

Sep 24 '21 00:09 jonhoo

One could track the number of succesfull and failed operations and view the operation mixture as a rough target. On a sufficiently large set it should be possible to tune the key operation distribution in a way that hits the target.

I'd argue that it is more important for a meaningful benchmark to measure the performance under real contention than hitting the correct operation ratio. Although that probably depends on the application.

Oct 01 '21 09:10 terrorfisch

I worry that, especially once you introduce key skew, you'd end up deviating so far from the target that the results no longer measure what the mix is intended to measure. But if you can provide a rationale for why that isn't likely to be the case (ideally with a benchmark to demonstrate), I'm definitely keen to change my mind!

Oct 03 '21 18:10 jonhoo

bustle bustle copied to clipboard

Add option for non-uniform key distribution

bustle
bustle copied to clipboard