containers
containers copied to clipboard
Optimize IntMap.alter using unboxed sums.
Use ptrEq
and unboxed sums (available in GHC >= 8.2) to track whether or not the map
required modification. If not skip rebuilding a spine and return the original map.
Benchmark results:
Before:
benchmarking alter
time 881.1 μs (862.0 μs .. 899.2 μs)
0.996 R² (0.994 R² .. 0.998 R²)
mean 854.9 μs (841.8 μs .. 872.5 μs)
std dev 49.55 μs (38.01 μs .. 69.97 μs)
variance introduced by outliers: 48% (moderately inflated)
After: (41% speedup)
benchmarking alter
time 513.1 μs (506.5 μs .. 519.6 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 517.8 μs (511.7 μs .. 528.5 μs)
std dev 27.05 μs (17.74 μs .. 47.52 μs)
variance introduced by outliers: 45% (moderately inflated)
I'll add more fine grained benchmarks if I get the chance tomorrow. Sending out for feedback on the code.
So, as is always the case, using unboxed sums makes some operations faster, and others a lot slower :(
Before:
benchmarking alter absent
time 283.1 μs (262.8 μs .. 298.1 μs)
0.982 R² (0.977 R² .. 0.991 R²)
mean 270.9 μs (262.3 μs .. 283.0 μs)
std dev 32.15 μs (24.01 μs .. 45.31 μs)
variance introduced by outliers: 84% (severely inflated)
benchmarking alter insert
time 277.1 μs (273.3 μs .. 282.3 μs)
0.997 R² (0.995 R² .. 0.999 R²)
mean 275.7 μs (271.7 μs .. 281.7 μs)
std dev 16.49 μs (11.93 μs .. 27.52 μs)
variance introduced by outliers: 56% (severely inflated)
benchmarking alter update
time 280.3 μs (276.7 μs .. 284.3 μs)
0.998 R² (0.998 R² .. 0.999 R²)
mean 280.5 μs (277.7 μs .. 284.5 μs)
std dev 11.21 μs (8.620 μs .. 15.42 μs)
variance introduced by outliers: 36% (moderately inflated)
benchmarking alter delete
time 282.5 μs (274.8 μs .. 289.7 μs)
0.996 R² (0.995 R² .. 0.998 R²)
mean 281.1 μs (276.0 μs .. 290.9 μs)
std dev 21.75 μs (12.40 μs .. 34.62 μs)
variance introduced by outliers: 68% (severely inflated)
benchmarking alter delete absent
time 243.8 μs (240.8 μs .. 246.9 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 241.7 μs (239.1 μs .. 245.8 μs)
std dev 11.08 μs (8.402 μs .. 14.84 μs)
variance introduced by outliers: 43% (moderately inflated)
After:
benchmarking alter absent (-16%)
time 237.9 μs (234.0 μs .. 241.8 μs)
0.997 R² (0.995 R² .. 0.999 R²)
mean 237.0 μs (232.9 μs .. 246.5 μs)
std dev 20.16 μs (7.155 μs .. 32.95 μs)
variance introduced by outliers: 74% (severely inflated)
benchmarking alter insert (+290%)
time 805.2 μs (783.5 μs .. 828.1 μs)
0.990 R² (0.979 R² .. 0.996 R²)
mean 822.7 μs (799.7 μs .. 872.8 μs)
std dev 110.8 μs (59.63 μs .. 192.9 μs)
variance introduced by outliers: 84% (severely inflated)
benchmarking alter update (+250%)
time 705.5 μs (699.2 μs .. 712.7 μs)
0.997 R² (0.995 R² .. 0.999 R²)
mean 703.5 μs (695.1 μs .. 716.0 μs)
std dev 34.66 μs (25.56 μs .. 49.30 μs)
variance introduced by outliers: 41% (moderately inflated)
benchmarking alter delete (+255%)
time 720.1 μs (684.8 μs .. 766.0 μs)
0.987 R² (0.978 R² .. 0.998 R²)
mean 684.8 μs (674.2 μs .. 702.9 μs)
std dev 46.01 μs (28.89 μs .. 72.45 μs)
variance introduced by outliers: 57% (severely inflated)
benchmarking alter delete absent (-8%)
time 225.4 μs (223.0 μs .. 227.6 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 225.1 μs (222.4 μs .. 231.9 μs)
std dev 13.72 μs (6.353 μs .. 26.03 μs)
variance introduced by outliers: 58% (severely inflated)
It looks like in cases where we need to modify the spine anyways the extra pattern matches (despite being unboxed sums) on every Bin
have quite a bit of overhead.
Take a look at https://github.com/treeowl/containers/commit/6f6fa02c84a4839413015bf7d52f8dc18bc33157. That's not very well organized yet, and doesn't deal with the lazy version, but it shows how the strict version can be made more eager, how pattern synonyms can cut down dramatically on the noise, and how we can "unbox" the Maybe
s consumed and produced by the passed function. Feel free to incorporate those ideas into your branch if you wish.
We need to benchmark the unboxed sums version against one that just uses pointer equality on each recursive call.
Thanks for the info David! I've been realllllly busy the last week, sorry I haven't had time to move this forward as quickly as I'd like. I'm hoping to get around to this sometime this week.
What's the status here, @m-renaud?
In light of https://github.com/haskell/containers/issues/538 it would be nice to make progress on this.
Oh geeze it's been a while since I've looked at this, and realistically I won't have substantial time to look into this and incorporate the suggestion's from the comment above in the near future.
I will try to look at it shortly. No guarantees.
I've marked this PR as a draft to clarify that it's not ready for final review / merge.
Shall I try to take this up and finish it - anyone know what there is to do left?
I don't remember,, sorry. Lost track of this one. Worth trying!
Shall I try to take this up and finish it
That would be great! :)
anyone know what there is to do left?
I think the minimum is to fix any strictness issues such as https://github.com/haskell/containers/pull/523#discussion_r165466312.
If the benchmarks show good improvements at this stage, I think this should be good to merge.
Adopting some of noise reduction measures suggested in https://github.com/treeowl/containers/commit/6f6fa02c84a4839413015bf7d52f8dc18bc33157 probably wouldn't hurt though.
Regarding further optimization ideas, I think those could be left for further work.
What is the status on this one? Are these benchmarks numbers still the same on GHC 9.2?
What is the status on this one? Are these benchmarks numbers still the same on GHC 9.2?
Has something changed there?
One can only hope, that's why I'm asking :)
I haven't tested recently. Do you have time to try?
No unfortunately I'm quite committed on Haddock and other stuff for the time being. How's the maintainer structure for containers
going btw? Need any help on this front?