Introduce Overflow & Displacement tracking.
Changes:
- Introduce Overflow Trackers, with features to select the desired variant.
- Introduce Displacements, conditional on the Overflow Tracker variant tracking removals.
- Adjust insertion/removal of items in RawTable to properly track overflow and displacement.
- Adjust find in RawTable to short-circuit probe sequence when overflow tracking ensure there is no need to probe further.
- OF NOTE: enforce group alignment.
Motivation:
Overflow tracking allows cutting a probing sequence short, which may be beneficial.
The use of a multitude of variants makes it easier to test and benchmark all variants, thus making it easier to pick the right one... or not pick any.
The groups are now forcibly aligned because overflow tracking is performed on a group basis, and does not work with "floating" groups.
Design:
Overflow trackers and displacements are tacked at the end of the allocation, and their access is minimized, so that their performance impact is minimized.
In particular:
- An element which does not overflow on insertion need not trigger a write to any overflow tracker, nor to its displacement.
- Only if removals are tracked is the displacement read on removal.
- Only if removals are tracked and the displacement is non-0 are overflow trackers written to on removal.
This follows the philosophy of "You Don't Pay For What You Don't Use", and makes the impact as minimal as can be.
Benchmarks:
Methodology: each variant was benchmarked 3 times, and for each benchmark the best result was picked. Then all results were normalized on the current master for ease of comparison.
| Benchmark | master | none | bloom-1-u8 | bloom-1-u16 | counter-u8 | hybrid |
|---|---|---|---|---|---|---|
| clone_from_large | 100% (+/-19.77%) | +0.00% (+/-0.20%) | +0.17% (+/-0.10%) | +0.00% (+/-0.20%) | -0.94% (+/-0.00%) | +1.18% (+/-0.18%) |
| clone_from_small | 100% (+/-6.82%) | +0.00% (+/-0.07%) | +2.27% (+/-0.20%) | +2.27% (+/-0.04%) | +0.00% (+/-0.25%) | +0.00% (+/-0.05%) |
| clone_large | 100% (+/-8.86%) | +0.00% (+/-0.09%) | +1.24% (+/-0.14%) | -0.66% (+/-0.07%) | -0.86% (+/-0.09%) | -1.04% (+/-0.07%) |
| clone_small | 100% (+/-9.09%) | +0.00% (+/-0.09%) | +3.64% (+/-0.05%) | +1.82% (+/-0.07%) | +0.00% (+/-0.07%) | +1.82% (+/-0.04%) |
| grow_insert_ahash_highbits | 100% (+/-4.54%) | +0.00% (+/-0.05%) | +0.24% (+/-0.03%) | -0.65% (+/-0.00%) | -0.51% (+/-0.05%) | +2.29% (+/-0.00%) |
| grow_insert_ahash_random | 100% (+/-0.02%) | +0.00% (+/-0.00%) | +2.83% (+/-0.00%) | +0.88% (+/-0.00%) | +0.53% (+/-0.00%) | +1.58% (+/-0.00%) |
| grow_insert_ahash_serial | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +0.85% (+/-0.05%) | +0.22% (+/-0.00%) | +1.46% (+/-0.00%) | +4.13% (+/-0.00%) |
| grow_insert_std_highbits | 100% (+/-0.00%) | +0.00% (+/-0.00%) | +0.81% (+/-0.00%) | +1.54% (+/-0.00%) | +0.14% (+/-0.00%) | +0.93% (+/-0.00%) |
| grow_insert_std_random | 100% (+/-1.61%) | +0.00% (+/-0.02%) | +4.05% (+/-0.00%) | +2.37% (+/-0.00%) | +3.96% (+/-0.00%) | +3.10% (+/-0.00%) |
| grow_insert_std_serial | 100% (+/-0.00%) | +0.00% (+/-0.00%) | +4.50% (+/-0.00%) | +3.71% (+/-0.00%) | +1.83% (+/-0.00%) | +5.21% (+/-0.00%) |
| insert_ahash_highbits | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +2.64% (+/-0.00%) | +1.21% (+/-0.00%) | +2.07% (+/-0.00%) | +1.45% (+/-0.00%) |
| insert_ahash_random | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +6.36% (+/-0.00%) | +0.48% (+/-0.00%) | +0.62% (+/-0.00%) | +0.38% (+/-0.00%) |
| insert_ahash_serial | 100% (+/-3.56%) | +0.00% (+/-0.04%) | +5.62% (+/-0.00%) | +5.34% (+/-0.00%) | -0.12% (+/-0.00%) | +0.20% (+/-0.00%) |
| insert_erase_ahash_highbits | 100% (+/-4.64%) | +0.00% (+/-0.05%) | +2.98% (+/-0.05%) | +3.52% (+/-0.00%) | +3.19% (+/-0.04%) | +7.18% (+/-0.00%) |
| insert_erase_ahash_random | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +2.59% (+/-0.00%) | +3.44% (+/-0.00%) | +2.80% (+/-0.00%) | +4.72% (+/-0.03%) |
| insert_erase_ahash_serial | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +0.50% (+/-0.06%) | +0.83% (+/-0.00%) | +5.17% (+/-0.00%) | +3.54% (+/-0.02%) |
| insert_erase_std_highbits | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +2.06% (+/-0.00%) | +2.07% (+/-0.00%) | +0.14% (+/-0.00%) | +0.40% (+/-0.03%) |
| insert_erase_std_random | 100% (+/-0.01%) | +0.00% (+/-0.00%) | -0.06% (+/-0.00%) | +0.84% (+/-0.00%) | -1.83% (+/-0.00%) | +0.95% (+/-0.00%) |
| insert_erase_std_serial | 100% (+/-1.97%) | +0.00% (+/-0.02%) | +4.26% (+/-0.00%) | +4.75% (+/-0.00%) | -0.75% (+/-0.00%) | +2.14% (+/-0.00%) |
| insert_std_highbits | 100% (+/-0.00%) | +0.00% (+/-0.00%) | +0.35% (+/-0.00%) | -0.69% (+/-0.00%) | -1.61% (+/-0.04%) | -1.21% (+/-0.00%) |
| insert_std_random | 100% (+/-0.00%) | +0.00% (+/-0.00%) | -2.34% (+/-0.00%) | -0.57% (+/-0.00%) | -0.69% (+/-0.00%) | +0.45% (+/-0.00%) |
| insert_std_serial | 100% (+/-2.18%) | +0.00% (+/-0.02%) | -2.24% (+/-0.00%) | -2.86% (+/-0.05%) | +0.69% (+/-0.00%) | +1.62% (+/-0.00%) |
| iter_ahash_highbits | 100% (+/-10.23%) | +0.00% (+/-0.10%) | +3.41% (+/-0.12%) | -1.46% (+/-0.07%) | -0.32% (+/-0.11%) | -0.97% (+/-0.06%) |
| iter_ahash_random | 100% (+/-3.57%) | +0.00% (+/-0.04%) | +1.95% (+/-0.08%) | -0.97% (+/-0.06%) | -0.65% (+/-0.07%) | -0.81% (+/-0.05%) |
| iter_ahash_serial | 100% (+/-8.93%) | +0.00% (+/-0.09%) | +2.60% (+/-0.09%) | -0.97% (+/-0.06%) | -0.81% (+/-0.04%) | -0.49% (+/-0.05%) |
| iter_std_highbits | 100% (+/-4.52%) | +0.00% (+/-0.05%) | +2.42% (+/-0.09%) | -0.48% (+/-0.06%) | +0.65% (+/-0.13%) | -0.16% (+/-0.06%) |
| iter_std_random | 100% (+/-5.47%) | +0.00% (+/-0.05%) | -0.16% (+/-0.12%) | -0.80% (+/-0.07%) | +0.64% (+/-0.08%) | +0.32% (+/-0.06%) |
| iter_std_serial | 100% (+/-6.44%) | +0.00% (+/-0.06%) | +1.77% (+/-0.07%) | +0.64% (+/-0.08%) | +1.93% (+/-0.02%) | +0.16% (+/-0.05%) |
| lookup_ahash_highbits | 100% (+/-4.26%) | +0.00% (+/-0.04%) | +4.47% (+/-0.12%) | +1.63% (+/-0.10%) | -1.20% (+/-0.07%) | +1.02% (+/-0.07%) |
| lookup_ahash_random | 100% (+/-5.24%) | +0.00% (+/-0.05%) | +8.50% (+/-0.08%) | +7.26% (+/-0.09%) | -0.50% (+/-0.05%) | +7.41% (+/-0.13%) |
| lookup_ahash_serial | 100% (+/-4.51%) | +0.00% (+/-0.05%) | +8.28% (+/-0.05%) | +6.62% (+/-0.07%) | +0.25% (+/-0.14%) | +8.25% (+/-0.13%) |
| lookup_fail_ahash_highbits | 100% (+/-7.58%) | +0.00% (+/-0.08%) | +10.95% (+/-0.18%) | +7.62% (+/-0.03%) | +1.89% (+/-0.05%) | +9.13% (+/-0.06%) |
| lookup_fail_ahash_random | 100% (+/-7.33%) | +0.00% (+/-0.07%) | +13.83% (+/-0.16%) | +9.87% (+/-0.08%) | -0.34% (+/-0.05%) | +12.93% (+/-0.12%) |
| lookup_fail_ahash_serial | 100% (+/-6.37%) | +0.00% (+/-0.06%) | +7.33% (+/-0.05%) | +11.93% (+/-0.20%) | +1.36% (+/-0.06%) | +10.31% (+/-0.05%) |
| lookup_fail_std_highbits | 100% (+/-7.78%) | +0.00% (+/-0.08%) | +3.68% (+/-0.06%) | +5.35% (+/-0.03%) | +0.60% (+/-0.05%) | +4.09% (+/-0.05%) |
| lookup_fail_std_random | 100% (+/-5.59%) | +0.00% (+/-0.06%) | +5.37% (+/-0.11%) | +6.13% (+/-0.04%) | +1.06% (+/-0.00%) | +5.11% (+/-0.08%) |
| lookup_fail_std_serial | 100% (+/-4.02%) | +0.00% (+/-0.04%) | +1.58% (+/-0.06%) | +4.38% (+/-0.11%) | +0.55% (+/-0.00%) | +3.10% (+/-0.05%) |
| lookup_std_highbits | 100% (+/-3.36%) | +0.00% (+/-0.03%) | +5.24% (+/-0.00%) | +7.26% (+/-0.00%) | +1.65% (+/-0.00%) | +4.80% (+/-0.09%) |
| lookup_std_random | 100% (+/-2.47%) | +0.00% (+/-0.02%) | +3.76% (+/-0.03%) | +3.32% (+/-0.06%) | +3.57% (+/-0.11%) | +3.22% (+/-0.06%) |
| lookup_std_serial | 100% (+/-9.09%) | +0.00% (+/-0.09%) | +8.38% (+/-0.04%) | +7.50% (+/-0.08%) | +7.86% (+/-0.09%) | +8.46% (+/-0.09%) |
| rehash_in_place | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +2.49% (+/-0.00%) | -1.66% (+/-0.00%) | +1.48% (+/-0.00%) | +5.18% (+/-0.00%) |
| insert | 100% (+/-0.01%) | +0.00% (+/-0.00%) | +0.25% (+/-0.11%) | -1.51% (+/-0.07%) | +4.53% (+/-0.13%) | +2.96% (+/-0.00%) |
| insert_unique_unchecked | 100% (+/-6.95%) | +0.00% (+/-0.07%) | -5.59% (+/-0.08%) | -10.45% (+/-0.06%) | -0.36% (+/-0.16%) | -4.54% (+/-0.05%) |
Remarks:
- The
nonevariant is completely neutral, which means that enforcing group alignment did not affect performance. - The other variants show some promise, but the results vary quite a bit depending on micro-optimization. Aggressive (always) inlining of key methods seemed to help, for example, but I am not so sure whether
may_have_overflowedshould be inlined since it's expected to be rare. - Whether the benchmark "suffer" from high probe counts is unknown to me. Overflow tracking is only helpful to cut probing sequences short, and thus pure overhead if there's no quadratic probing.
In any case, at least with the scaffolding in place it should be possible to experiment further if there's any will to.
:umbrella: The latest upstream changes (presumably #525) made this pull request unmergeable. Please resolve the merge conflicts.