poop
poop copied to clipboard
add an option to show ratio instead of percent delta (possibly by default)
I think that percentages are not easier to comprehend than ratios, especially when the delta is quite big. An example:
Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 26.565ms ± 3.741ms 25.269ms … 52.593ms 18 ( 5%) 0% peak_rss 16M ± 1K 16M … 16M 90 (24%) 0% cpu_cycles 29460307 ± 352152 27881614 … 34087924 25 ( 7%) 0% instructions 68245274 ± 3 68245252 … 68245299 8 ( 2%) 0% cache_references 1905677 ± 11890 1885623 … 2050296 4 ( 1%) 0% cache_misses 35904 ± 994 34424 … 51464 7 ( 2%) 0% branch_misses 18101 ± 75 18032 … 19280 16 ( 4%) 0% Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 499.521ms ± 16.373ms 486.703ms … 547.971ms 2 (10%) 💩+1780.3% ± 8.6% peak_rss 30M ± 2K 30M … 30M 0 ( 0%) 💩+ 89.5% ± 0.0% cpu_cycles 1436695570 ± 36769236 1385230017 … 1548137633 4 (19%) 💩+4776.7% ± 12.4% instructions 443694437 ± 8150479 433293521 … 465060803 2 (10%) 💩+550.1% ± 1.2% cache_references 51072489 ± 227383 50490604 … 51378709 1 ( 5%) 💩+2580.0% ± 1.2% cache_misses 21754058 ± 27034 21713445 … 21806249 0 ( 0%) 💩+60490.3% ± 7.5% branch_misses 4283745 ± 190618 4059269 … 4782911 1 ( 5%) 💩+23565.4% ± 104.1% Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 168.705ms ± 17.803ms 149.247ms … 226.071ms 1 ( 2%) 💩+535.1% ± 7.6% peak_rss 17M ± 2K 17M … 17M 0 ( 0%) 💩+ 8.0% ± 0.0% cpu_cycles 483128706 ± 36179722 453889331 … 625311678 6 (10%) 💩+1539.9% ± 12.3% instructions 657956378 ± 7 657956360 … 657956401 2 ( 3%) 💩+864.1% ± 0.0% cache_references 5418462 ± 3474959 3555473 … 26208274 5 ( 8%) 💩+184.3% ± 18.3% cache_misses 563791 ± 85035 512236 … 1009351 9 (15%) 💩+1470.3% ± 23.8% branch_misses 1009697 ± 704651 823842 … 4319352 11 (18%) 💩+5478.0% ± 391.1%
Here is the same benchmark run with the worst one first:
Benchmark 1 (10 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 501.108ms ± 12.548ms 487.794ms … 516.363ms 0 ( 0%) 0% peak_rss 30M ± 1K 30M … 30M 2 (20%) 0% cpu_cycles 1483284321 ± 13871550 1451769864 … 1497658793 2 (20%) 0% instructions 440695501 ± 8102579 432198649 … 459702251 0 ( 0%) 0% cache_references 51048638 ± 241437 50737084 … 51402025 0 ( 0%) 0% cache_misses 21761058 ± 32075 21698123 … 21794997 0 ( 0%) 0% branch_misses 4199931 ± 180500 4013353 … 4614098 0 ( 0%) 0% Benchmark 2 (31 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 161.333ms ± 13.266ms 143.47ms … 192.347ms 0 ( 0%) ⚡- 67.8% ± 1.9% peak_rss 17M ± 2K 17M … 17M 0 ( 0%) ⚡- 43.0% ± 0.0% cpu_cycles 461790403 ± 7477095 451871827 … 478106740 0 ( 0%) ⚡- 68.9% ± 0.5% instructions 657956376 ± 5 657956369 … 657956387 0 ( 0%) 💩+ 49.3% ± 0.7% cache_references 3920322 ± 257185 3555405 … 4573300 0 ( 0%) ⚡- 92.3% ± 0.4% cache_misses 518445 ± 4596 510041 … 528130 0 ( 0%) ⚡- 97.6% ± 0.1% branch_misses 824224 ± 250 823674 … 824660 0 ( 0%) ⚡- 80.4% ± 1.5% Benchmark 3 (147 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers delta wall_time 34.088ms ± 12.053ms 24.99ms … 55.173ms 0 ( 0%) ⚡- 93.2% ± 1.5% peak_rss 16M ± 2K 16M … 16M 38 (26%) ⚡- 47.2% ± 0.0% cpu_cycles 29127566 ± 811043 27831022 … 30678771 0 ( 0%) ⚡- 98.0% ± 0.1% instructions 68245275 ± 2 68245272 … 68245280 3 ( 2%) ⚡- 84.5% ± 0.3% cache_references 1916213 ± 34330 1887064 … 2306377 1 ( 1%) ⚡- 96.2% ± 0.1% cache_misses 36561 ± 922 35153 … 39289 1 ( 1%) ⚡- 99.8% ± 0.0% branch_misses 18107 ± 57 18030 … 18373 3 ( 2%) ⚡- 99.6% ± 0.7%
I think something like this is much easier to grok:
Benchmark 1 (376 runs): zig-out/bench/ReleaseSafe/zimalloc/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers ratio wall_time 26.565ms ± 3.741ms 25.269ms … 52.593ms 18 ( 5%) 1x peak_rss 16M ± 1K 16M … 16M 90 (24%) 1x cpu_cycles 29460307 ± 352152 27881614 … 34087924 25 ( 7%) 1x instructions 68245274 ± 3 68245252 … 68245299 8 ( 2%) 1x cache_references 1905677 ± 11890 1885623 … 2050296 4 ( 1%) 1x cache_misses 35904 ± 994 34424 … 51464 7 ( 2%) 1x branch_misses 18101 ± 75 18032 … 19280 16 ( 4%) 1x Benchmark 2 (21 runs): zig-out/bench/ReleaseSafe/gpa/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers ratio wall_time 499.521ms ± 16.373ms 486.703ms … 547.971ms 2 (10%) 💩18.803x ± 0.086 peak_rss 30M ± 2K 30M … 30M 0 ( 0%) 💩 1.895x ± 0.000 cpu_cycles 1436695570 ± 36769236 1385230017 … 1548137633 4 (19%) 💩48.767% ± 0.124 instructions 443694437 ± 8150479 433293521 … 465060803 2 (10%) 💩6.501x ± 0.012 cache_references 51072489 ± 227383 50490604 … 51378709 1 ( 5%) 💩26.800x ± 0.012 cache_misses 21754058 ± 27034 21713445 … 21806249 0 ( 0%) 💩61.4903x ± 0.075 branch_misses 4283745 ± 190618 4059269 … 4782911 1 ( 5%) 💩24.5654x ± 1.041 Benchmark 3 (60 runs): zig-out/bench/ReleaseSafe/mesh/create-destroy-loop-native-ReleaseSafe: measurement mean ± σ min … max outliers ratio wall_time 168.705ms ± 17.803ms 149.247ms … 226.071ms 1 ( 2%) 💩6.351x ± 0.076 peak_rss 17M ± 2K 17M … 17M 0 ( 0%) 💩 1.080x ± 0.000 cpu_cycles 483128706 ± 36179722 453889331 … 625311678 6 (10%) 💩16.399x ± 0.123 instructions 657956378 ± 7 657956360 … 657956401 2 ( 3%) 💩9.641x ± 0.000 cache_references 5418462 ± 3474959 3555473 … 26208274 5 ( 8%) 💩2.843x ± 0.183 cache_misses 563791 ± 85035 512236 … 1009351 9 (15%) 💩15.703x ± 0.238 branch_misses 1009697 ± 704651 823842 … 4319352 11 (18%) 💩55.780x ± 3.911
(I didn't properly convert the numbers on the confidence intervals to a ratio, so they'll be a bit off)
The ratio will be even easier to read (relative to the delta) if you also truncate some of the less significant figures in which case the ratio will need fewer digits than the delta to represent the performance differences (assuming we don't want to use scientific notation for the delta).
I agree about the hyperfine-style times faster
/times slower
being easier to understand. My suggestion would be to change the header to:
times faster/slower
instead of
ratio
since the 'times faster/slower' part is necessary context for what something like ⚡2.0x
means
faster/slower
only works for the wall-time measurement. I think I'd just go for showing the ratio measurement / reference
, so it will be good/green if it's significantly < 1 and bad/red if it's significantly > 1 and gray when it's not significantly different from 1. My example wasn't great as everything was worse than the reference.