zig
zig copied to clipboard
hash_map.zig: Pass `self` by value and less pointer-int conversion
- Pass self via value instead of reference in numerous places throughout
HashMapUnmangedandHashMap(orignally proposed in #19770) - Replaced some instances of
@intFromPtrand@ptrFromIntwith@ptrCastand@alignCast, and pointer arithmetic where appropriate
With this, the only remaining instance of pointer-int conversion in hash_map.zig is in HashMapUnmanaged.removeByPtr, which easily be able to be eliminated once pointer subtraction is supported.
Would be great to run some benchmarks before/after, to make sure this doesn't regress performance. My understanding is that this should improve performance in the ideal Zig, but also that the current version of Zig isn't quiet ideal yet.
@matklad see https://github.com/ziglang/zig/pull/19770#issuecomment-2079952526:
If this was done originally because the compiler was choosing to pass the hashmap by value but it was actually faster to pass by reference then that sounds like a compiler bug and we should fix that instead of using a workaround. The standard library is the last place I would expect to see compiler bug workarounds.
and
The above analysis is accurate -- such workarounds are effectively banned in
std. There have been (and continue to be) bugs around PRO (Parameter Reference Optimization), but working around them instdis considered bad form.
Yes, I am aware of that context. The expected value of benchmarking here is still quiet high:
- benchmark not showing significant perf differences is a baseline result, but still is a useful confirmation that the compiler works as intended.
- benchmark showing significant improvement would be surprising, welcomed, but not really actionable.
- benchmark showing a significant regression would be surprising and actionable: hash maps are in general super-hot code, so this would put more priority on fixing the compiler to deal with at least this case more optimally.
Not saying that we shouldn't land this without a benchmark, or that we should block this on compiler improvements should the results be disappointing! Just that with a benchmark this change would be even more valuable than it already is: if it is a performance regression, chasing it after the fact would be quite hard, but also important, because this is hash-map specifically, and not some random cold code!
Alright, I've finally gotten around to doing some benchmarks. If you want to run the benchmark on your own machine, all of the code is at this repo, and you can run the benchmark with zig build run. With that out of the way, looking at the benchmarks I've run on my machine, the difference between passing by pointer and value seems to be negligible to the point that it's hard to tell which one is faster.
To be comprehensive, here is my benchmark results for ReleaseFast mode:
benchmark runs total time time/run (avg ± σ) (min ... max) p75 p99 p995
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
ptr-arg hashmap: Call contains (unmanaged) 1024 840.895ms 821.187us ± 41.79us (771.046us ... 1.033ms) 835.314us 977.317us 1.005ms
val-arg hashmap: Call contains (unmanaged) 1024 840.774ms 821.068us ± 54.162us (771.046us ... 1.337ms) 837.752us 1.028ms 1.111ms
ptr-arg hashmap: Call contains (managed) 1024 867.909ms 847.567us ± 66.465us (771.049us ... 1.129ms) 873.053us 1.067ms 1.1ms
val-arg hashmap: Call contains (managed) 1024 860.379ms 840.214us ± 64.813us (771.048us ... 1.247ms) 866.001us 1.046ms 1.106ms
ptr-arg hashmap: Call capacity (unmanaged) 1024 696.319ms 679.999us ± 44.924us (642.543us ... 1.149ms) 680.966us 877.051us 895.442us
val-arg hashmap: Call capacity (unmanaged) 1024 709.52ms 692.89us ± 115.045us (642.543us ... 2.01ms) 680.642us 1.327ms 1.475ms
ptr-arg hashmap: Call capacity (managed) 1024 864.335ms 844.078us ± 331.547us (642.544us ... 2.381ms) 771.53us 1.827ms 2.03ms
val-arg hashmap: Call capacity (managed) 1024 757.228ms 739.48us ± 152.851us (642.547us ... 2.165ms) 732.957us 1.457ms 1.902ms
ptr-arg hashmap: Call count (unmanaged) 1024 143.425ms 140.063us ± 17.142us (128.528us ... 273.527us) 135.506us 220.129us 241.11us
val-arg hashmap: Call count (unmanaged) 1024 150.1ms 146.582us ± 23.613us (135.463us ... 382.999us) 143.232us 271.365us 307.654us
ptr-arg hashmap: Call count (managed) 1024 865.575ms 845.288us ± 112.199us (771.048us ... 1.891ms) 857.128us 1.467ms 1.76ms
val-arg hashmap: Call count (managed) 1024 843.41ms 823.642us ± 48.515us (771.048us ... 1.39ms) 835.33us 971.274us 1.135ms
ptr-arg hashmap: Call get (unmanaged) 1024 7.415s 7.242ms ± 290.368us (6.943ms ... 8.522ms) 7.387ms 8.193ms 8.269ms
val-arg hashmap: Call get (unmanaged) 1024 7.497s 7.321ms ± 348.766us (6.943ms ... 8.837ms) 7.506ms 8.391ms 8.683ms
ptr-arg hashmap: Call get (managed) 1024 7.596s 7.418ms ± 303.219us (7.072ms ... 8.814ms) 7.584ms 8.498ms 8.593ms
val-arg hashmap: Call get (managed) 1024 7.64s 7.461ms ± 337.37us (7.072ms ... 8.898ms) 7.664ms 8.504ms 8.63ms
ptr-arg hashmap: Many calls (unmanaged) 1024 8.281s 8.087ms ± 358.786us (7.586ms ... 9.798ms) 8.281ms 9.114ms 9.324ms
val-arg hashmap: Many calls (unmanaged) 1024 8.251s 8.058ms ± 368.044us (7.586ms ... 9.502ms) 8.297ms 9.192ms 9.219ms
ptr-arg hashmap: Many calls (managed) 1024 9.095s 8.881ms ± 427.16us (8.357ms ... 10.752ms) 9.103ms 10.162ms 10.265ms
val-arg hashmap: Many calls (managed) 1024 9.167s 8.953ms ± 369.686us (8.357ms ... 10.503ms) 9.149ms 10.083ms 10.202ms
And here is my benchmark results for Debug mode:
benchmark runs total time time/run (avg ± σ) (min ... max) p75 p99 p995
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
ptr-arg hashmap: Call contains (unmanaged) 1024 7.902s 7.717ms ± 438.753us (7.11ms ... 12.57ms) 7.828ms 9.485ms 9.824ms
val-arg hashmap: Call contains (unmanaged) 1024 7.803s 7.62ms ± 298.93us (7.148ms ... 11.868ms) 7.693ms 8.445ms 8.691ms
ptr-arg hashmap: Call contains (managed) 1024 9.104s 8.891ms ± 666.825us (8.109ms ... 16.428ms) 9.044ms 11.423ms 11.9ms
val-arg hashmap: Call contains (managed) 1024 8.828s 8.621ms ± 592.865us (8.112ms ... 15.346ms) 8.79ms 11.01ms 12.502ms
ptr-arg hashmap: Call capacity (unmanaged) 1024 2.805s 2.739ms ± 414.185us (2.313ms ... 5.698ms) 2.727ms 4.843ms 5.287ms
val-arg hashmap: Call capacity (unmanaged) 1024 2.728s 2.665ms ± 241.557us (2.313ms ... 4.915ms) 2.709ms 3.839ms 4.145ms
ptr-arg hashmap: Call capacity (managed) 1024 8.984s 8.773ms ± 389.197us (8.223ms ... 12.681ms) 8.916ms 10.193ms 11.243ms
val-arg hashmap: Call capacity (managed) 1024 9.204s 8.988ms ± 588.288us (8.21ms ... 13.33ms) 9.158ms 11.257ms 11.629ms
ptr-arg hashmap: Call count (unmanaged) 1024 2.495s 2.437ms ± 187.103us (2.153ms ... 4.69ms) 2.489ms 3.179ms 3.55ms
val-arg hashmap: Call count (unmanaged) 1024 2.478s 2.42ms ± 167.383us (2.12ms ... 3.41ms) 2.466ms 3.003ms 3.227ms
ptr-arg hashmap: Call count (managed) 1024 9.129s 8.915ms ± 537.586us (8.113ms ... 12.996ms) 9.135ms 10.888ms 11.219ms
val-arg hashmap: Call count (managed) 1024 9.061s 8.849ms ± 432.427us (8.157ms ... 11.116ms) 9.07ms 10.366ms 10.511ms
ptr-arg hashmap: Call get (unmanaged) 1024 8.828s 8.621ms ± 350.495us (8.222ms ... 11.044ms) 8.784ms 9.912ms 10.255ms
val-arg hashmap: Call get (unmanaged) 1024 9.03s 8.819ms ± 456.875us (8.223ms ... 12.453ms) 9.004ms 10.3ms 11.177ms
ptr-arg hashmap: Call get (managed) 1024 12.507s 12.214ms ± 517.908us (11.543ms ... 18.533ms) 12.364ms 13.943ms 14.856ms
val-arg hashmap: Call get (managed) 1024 12.413s 12.122ms ± 611.512us (11.377ms ... 16.251ms) 12.378ms 14.292ms 15.061ms
ptr-arg hashmap: Many calls (unmanaged) 1024 19.053s 18.607ms ± 827.405us (17.717ms ... 23.062ms) 19.015ms 21.421ms 22.219ms
val-arg hashmap: Many calls (unmanaged) 1024 19.445s 18.989ms ± 1.099ms (17.742ms ... 29.752ms) 19.379ms 22.988ms 25.257ms
ptr-arg hashmap: Many calls (managed) 1024 37.847s 36.96ms ± 2.908ms (34.378ms ... 83.93ms) 37.64ms 44.634ms 50.221ms
val-arg hashmap: Many calls (managed) 1024 37.67s 36.788ms ± 1.363ms (34.591ms ... 64.526ms) 37.262ms 40.792ms 42.835ms