containers icon indicating copy to clipboard operation
containers copied to clipboard

More efficient fromList for sets and maps

Open meooow25 opened this issue 1 year ago • 1 comments

Originally proposed and implemented for IntMap in #653 by @treeowl.

fromList is repeated insert. insert goes down the tree to the insertion position, then back up to the root.
The idea is that from the insertion position we go to the next insertion position directly instead of up to the root.

This means that the total time of fromList is the sum of the distances on the tree between adjacent elements. Clusters of adjacent elements will take linear time to insert. If the list is ascending or descending, the entire function is linear time. The worst case remains the same as before, we may still need to go up to the root for every element.

For IntSet/IntMap, that is all there is to it. For Set/Map, we have a problem that repeatedly inserting at a position will make the tree unbalanced. To fix this, it should be possible to go up towards the root restoring balance until there are no unbalanced nodes, then going to the next insertion position as usual.

meooow25 avatar Mar 30 '25 14:03 meooow25

Here is a comparison of a few implementations: https://gist.github.com/meooow25/e6a85fe75367d9826145f1e77b6ba062

  • current: foldl' (flip insert) empty
  • new (original): Patched version of #653
  • new (alternate): Another version I wrote of the new algorithm
  • new (fusible): Fusible version of the new algorithm
current
  fromList:asc                52.9 μs ± 5.1 μs, 480 KB alloc
  fromList:asc:fusion         49.7 μs ± 3.1 μs, 480 KB alloc
  fromList:asc:sparse         101  μs ± 9.5 μs, 863 KB alloc
  fromList:asc:sparse:fusion  95.2 μs ± 6.7 μs, 863 KB alloc
  fromList:random             441  μs ±  25 μs, 1.5 MB alloc
  fromList:random:fusion      436  μs ±  42 μs, 1.5 MB alloc
  fromList:newWorstCase       831  μs ±  57 μs, 6.9 MB alloc
new (original)
  fromList:asc                17.2 μs ± 678 ns, 5.0 KB alloc, 0.32x
  fromList:asc:fusion         31.9 μs ± 1.5 μs, 293 KB alloc, 0.64x
  fromList:asc:sparse         49.8 μs ± 4.9 μs, 224 KB alloc, 0.49x
  fromList:asc:sparse:fusion  65.3 μs ± 5.4 μs, 512 KB alloc, 0.69x
  fromList:random             445  μs ±  22 μs, 1.2 MB alloc, 1.01x
  fromList:random:fusion      477  μs ±  42 μs, 1.5 MB alloc, 1.10x
  fromList:newWorstCase       1.04 ms ±  99 μs, 6.8 MB alloc, 1.25x
new (alternate)
  fromList:asc                13.5 μs ± 666 ns, 7.0 KB alloc, 0.26x
  fromList:asc:fusion         29.9 μs ± 2.9 μs, 295 KB alloc, 0.60x
  fromList:asc:sparse         47.3 μs ± 4.0 μs, 288 KB alloc, 0.47x
  fromList:asc:sparse:fusion  66.4 μs ± 5.9 μs, 576 KB alloc, 0.70x
  fromList:random             437  μs ±  22 μs, 1.2 MB alloc, 0.99x
  fromList:random:fusion      457  μs ±  43 μs, 1.5 MB alloc, 1.05x
  fromList:newWorstCase       1.06 ms ±  96 μs, 6.8 MB alloc, 1.27x
new (fusible)
  fromList:asc                16.6 μs ± 1.1 μs, 5.6 KB alloc, 0.31x
  fromList:asc:fusion         4.89 μs ± 362 ns, 5.5 KB alloc, 0.10x
  fromList:asc:sparse         51.5 μs ± 3.2 μs, 352 KB alloc, 0.51x
  fromList:asc:sparse:fusion  50.2 μs ± 2.8 μs, 352 KB alloc, 0.53x
  fromList:random             427  μs ±  27 μs, 2.4 MB alloc, 0.97x
  fromList:random:fusion      421  μs ±  24 μs, 2.4 MB alloc, 0.97x
  fromList:newWorstCase       1.29 ms ±  49 μs,  14 MB alloc, 1.55x
  • The new algorithm is much better on asc inputs. This is an asymptotically better algorithm, so the difference will only increase with larger sizes. Of course, the same applies to desc inputs or other inputs having some beneficial order in them.
  • The fusible version is better that non-fusing versions on the asc case, but interesting barely any better on the random:fusion case. The cost of building the map dominates the list allocations.
  • newWorstCase represents the worst case for the new algorithm where it does more work compared to the current algorithm. However, 1. this is a pathological case, and 2. this is only a constant-factor increase, so I think it's okay to accept it in exchange for the other gains.

I think it's looking good enough to go with the new (fusible) version, I will make a PR for it.

meooow25 avatar Apr 06 '25 14:04 meooow25