More efficient fromList for sets and maps
Originally proposed and implemented for IntMap in #653 by @treeowl.
fromList is repeated insert. insert goes down the tree to the insertion position, then back up to the root.
The idea is that from the insertion position we go to the next insertion position directly instead of up to the root.
This means that the total time of fromList is the sum of the distances on the tree between adjacent elements. Clusters of adjacent elements will take linear time to insert. If the list is ascending or descending, the entire function is linear time. The worst case remains the same as before, we may still need to go up to the root for every element.
For IntSet/IntMap, that is all there is to it. For Set/Map, we have a problem that repeatedly inserting at a position will make the tree unbalanced. To fix this, it should be possible to go up towards the root restoring balance until there are no unbalanced nodes, then going to the next insertion position as usual.
Here is a comparison of a few implementations: https://gist.github.com/meooow25/e6a85fe75367d9826145f1e77b6ba062
-
current:foldl' (flip insert) empty -
new (original): Patched version of #653 -
new (alternate): Another version I wrote of the new algorithm -
new (fusible): Fusible version of the new algorithm
current
fromList:asc 52.9 μs ± 5.1 μs, 480 KB alloc
fromList:asc:fusion 49.7 μs ± 3.1 μs, 480 KB alloc
fromList:asc:sparse 101 μs ± 9.5 μs, 863 KB alloc
fromList:asc:sparse:fusion 95.2 μs ± 6.7 μs, 863 KB alloc
fromList:random 441 μs ± 25 μs, 1.5 MB alloc
fromList:random:fusion 436 μs ± 42 μs, 1.5 MB alloc
fromList:newWorstCase 831 μs ± 57 μs, 6.9 MB alloc
new (original)
fromList:asc 17.2 μs ± 678 ns, 5.0 KB alloc, 0.32x
fromList:asc:fusion 31.9 μs ± 1.5 μs, 293 KB alloc, 0.64x
fromList:asc:sparse 49.8 μs ± 4.9 μs, 224 KB alloc, 0.49x
fromList:asc:sparse:fusion 65.3 μs ± 5.4 μs, 512 KB alloc, 0.69x
fromList:random 445 μs ± 22 μs, 1.2 MB alloc, 1.01x
fromList:random:fusion 477 μs ± 42 μs, 1.5 MB alloc, 1.10x
fromList:newWorstCase 1.04 ms ± 99 μs, 6.8 MB alloc, 1.25x
new (alternate)
fromList:asc 13.5 μs ± 666 ns, 7.0 KB alloc, 0.26x
fromList:asc:fusion 29.9 μs ± 2.9 μs, 295 KB alloc, 0.60x
fromList:asc:sparse 47.3 μs ± 4.0 μs, 288 KB alloc, 0.47x
fromList:asc:sparse:fusion 66.4 μs ± 5.9 μs, 576 KB alloc, 0.70x
fromList:random 437 μs ± 22 μs, 1.2 MB alloc, 0.99x
fromList:random:fusion 457 μs ± 43 μs, 1.5 MB alloc, 1.05x
fromList:newWorstCase 1.06 ms ± 96 μs, 6.8 MB alloc, 1.27x
new (fusible)
fromList:asc 16.6 μs ± 1.1 μs, 5.6 KB alloc, 0.31x
fromList:asc:fusion 4.89 μs ± 362 ns, 5.5 KB alloc, 0.10x
fromList:asc:sparse 51.5 μs ± 3.2 μs, 352 KB alloc, 0.51x
fromList:asc:sparse:fusion 50.2 μs ± 2.8 μs, 352 KB alloc, 0.53x
fromList:random 427 μs ± 27 μs, 2.4 MB alloc, 0.97x
fromList:random:fusion 421 μs ± 24 μs, 2.4 MB alloc, 0.97x
fromList:newWorstCase 1.29 ms ± 49 μs, 14 MB alloc, 1.55x
- The new algorithm is much better on asc inputs. This is an asymptotically better algorithm, so the difference will only increase with larger sizes. Of course, the same applies to desc inputs or other inputs having some beneficial order in them.
- The fusible version is better that non-fusing versions on the asc case, but interesting barely any better on the
random:fusioncase. The cost of building the map dominates the list allocations. -
newWorstCaserepresents the worst case for the new algorithm where it does more work compared to the current algorithm. However, 1. this is a pathological case, and 2. this is only a constant-factor increase, so I think it's okay to accept it in exchange for the other gains.
I think it's looking good enough to go with the new (fusible) version, I will make a PR for it.