containers icon indicating copy to clipboard operation
containers copied to clipboard

Speed up fromList for IntMap

Open treeowl opened this issue 6 years ago • 32 comments

Make fromList and fromListWithKey for IntMap smarter. Rather than rebuilding the path from the root for each element, insert as many elements as possible into each subtree before backing out.

treeowl avatar Jul 03 '19 01:07 treeowl

@jwaldmann, your review/benchmarking would be greatly appreciated.

treeowl avatar Jul 03 '19 01:07 treeowl

Interestingly, it seems that the new fromList implementation is faster than fromAscList. Surprisingly (to me), it's sometimes much faster. So if we replace fromList, we should probably define fromAscList = fromList unless and until someone finds a faster implementation of fromAscList.

treeowl avatar Jul 03 '19 22:07 treeowl

Hmm..... I have the feeling that part of the trouble with fromAscList is that fromDistinctAscList is (probably) less efficient than it can be. It uses an explicit stack in a rather confusing fashion. If we could make it use the GHC reduction stack instead, that might cut allocation and speed things up while also making the code less hard to read. But it's so convoluted that I can't really work out the control flow....

treeowl avatar Jul 04 '19 02:07 treeowl

Hi. I was reading your code and did some experiments.

Code: the description should make it more clear that insertSome/Many inserts elements as long as they fit, and stops when encountering the first element that does not (and returns everything after that). I had to scan the source to confirm. The wording of your comment would also allow the interpretation that all of the list is scanned for elements that fit. Of course we should not do that. Or should we? My first reaction is "that can become quadratic" but I am not sure.

Experiment: since the algorithm does not have look-ahead, it can be fooled. In the following, I am using

easy 3 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
hard 3 = [0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15]

This is a ghci session on Data.IntMap.Internal (using your version of fromList). Measurements are totally unscientific ...

easy e = [0 :: Int ..2^(e+1)-1]
m [] ys = ys ; m (x:xs) ys = x : m ys xs
hard e = m [0 :: Int ..2^e-1] [2^e .. 2^(e+1)-1]

:set +s

size $ fromList $ zip (easy 18) $ repeat ()  -- 7 seconds
size $ fromList $ zip (hard 18) $ repeat () -- 27 seconds

-- compare to naive fromList implementation:
fromList0 = Foldable.foldl' (\m (k,v) -> insert k v m) empty

size $ fromList0 $ zip (easy 18) $ repeat () -- 16 seconds
size $ fromList0 $ zip (hard 18) $ repeat () -- 17 seconds

-- compare to from*AscList
size $ fromAscList $ zip (easy 18) $ repeat () -- 6 sec
size $ fromDistinctAscList  $ zip (easy 18) $ repeat () -- 5 sec

Do you want to export insertAll? Of course insertAll t kvs = union t (fromList kvs) but it may be faster.

evens e = [0 :: Int , 2 .. 2^e-1]
odds e = [1 :: Int , 3 .. 2^e-1]

t = fromList $ zip (evens 19) $ repeat ()
size t -- just to evaluate it

size $ insertAll t $ zip (odds 19) $ repeat () -- 5 seconds
size $ union t $ fromList (zip (odds 19) $ repeat ()) -- 6 seconds

jwaldmann avatar Jul 05 '19 14:07 jwaldmann

Thanks, @jwaldmann! If your results are for a -O2-compiled Data.IntMap.Internal, then the bad-case slowdown from 17 seconds to 27 seconds seems a bit unfortunate, though probably tolerable. Do you or @int-e have any ideas for cutting that down? I have to wonder if unproductive insertMany calls are to blame. In that case, we pattern match on the list, then throw away the result of doing so only to match on it again immediately. Perhaps there's some way to avoid that (maybe with unboxed sums)? It's also a bit sketchy to call these insertion functions on trees we just created.

treeowl avatar Jul 06 '19 01:07 treeowl

Oh, and yes, I would prefer to export insertAll, but that will require some mailing list discussion to settle on a name.

treeowl avatar Jul 06 '19 01:07 treeowl

-O2 compiled

they were for ghci, now I compiled them (https://github.com/jwaldmann/containers/tree/intmap-fromList) and get these numbers. The difference (between contiguous and interleaved) is smaller than with ghci (where interleaved is a more descriptive name for [0,2^e,1,2^e+1,..]) For (pseudo) random data, times are quite high - but don't differ from previous implementation.

stack bench --resolver=lts-13.27  containers-tests:intmap-benchmarks --ba "-m pattern fromList"
containers-tests> benchmarks
Running 1 benchmarks...
Benchmark intmap-benchmarks: RUNNING...
benchmarked pointwise/fromList/contiguous
time                 6.568 ms   (6.474 ms .. 6.694 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 6.388 ms   (6.280 ms .. 6.469 ms)
std dev              300.2 μs   (217.6 μs .. 439.2 μs)
variance introduced by outliers: 21% (moderately inflated)

benchmarked pointwise/fromList/interleaved
time                 21.05 ms   (20.77 ms .. 21.32 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 20.28 ms   (19.48 ms .. 20.56 ms)
std dev              986.3 μs   (459.2 μs .. 1.921 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked pointwise/fromList/sparse
time                 7.084 ms   (6.953 ms .. 7.241 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 6.603 ms   (6.466 ms .. 6.705 ms)
std dev              352.3 μs   (257.0 μs .. 522.1 μs)
variance introduced by outliers: 25% (moderately inflated)

benchmarked pointwise/fromList/random
time                 34.74 ms   (34.21 ms .. 35.22 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 34.33 ms   (33.80 ms .. 34.79 ms)
std dev              1.024 ms   (663.2 μs .. 1.433 ms)

benchmarked pointwise/fromList_via_foldl_insert/contiguous
time                 8.886 ms   (8.758 ms .. 9.022 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 8.643 ms   (8.529 ms .. 8.733 ms)
std dev              282.4 μs   (207.7 μs .. 389.0 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/interleaved
time                 18.84 ms   (18.36 ms .. 19.39 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 18.46 ms   (18.15 ms .. 18.72 ms)
std dev              670.2 μs   (488.1 μs .. 972.3 μs)

benchmarked pointwise/fromList_via_foldl_insert/sparse
time                 9.198 ms   (9.075 ms .. 9.353 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 9.097 ms   (8.987 ms .. 9.184 ms)
std dev              277.5 μs   (200.0 μs .. 376.3 μs)
variance introduced by outliers: 10% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/random
time                 33.30 ms   (32.80 ms .. 33.76 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 33.55 ms   (33.27 ms .. 33.83 ms)
std dev              605.4 μs   (466.4 μs .. 791.3 μs)

Benchmark intmap-benchmarks: FINISH


jwaldmann avatar Jul 06 '19 15:07 jwaldmann

Could you do me a favor and also run some tests with longer lists? I saw much bigger differences for contiguous data in my own testing, where I was using millions of elements.

treeowl avatar Jul 06 '19 16:07 treeowl

one millon:

benchmarked 2^20/pointwise/contiguous/fromList                                                          
time                 34.67 ms   (33.36 ms .. 35.76 ms)                                                  
                     0.996 R²   (0.992 R² .. 0.999 R²)                                                  
mean                 33.09 ms   (31.78 ms .. 34.01 ms)                                                  
std dev              2.167 ms   (1.404 ms .. 2.926 ms)                                                  
variance introduced by outliers: 20% (moderately inflated)                                              
 
benchmarking 2^20/pointwise/contiguous/fromList_via_foldl_insert ... took 9.431 s, total 56 iterations
benchmarked 2^20/pointwise/contiguous/fromList_via_foldl_insert
time                 148.2 ms   (140.6 ms .. 154.7 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 162.6 ms   (157.9 ms .. 167.7 ms)
std dev              8.422 ms   (5.956 ms .. 13.63 ms)

benchmarking 2^20/pointwise/interleaved/fromList ... took 12.49 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList
time                 211.5 ms   (209.6 ms .. 213.4 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 214.7 ms   (213.7 ms .. 215.7 ms)
std dev              1.685 ms   (1.233 ms .. 2.245 ms)

benchmarking 2^20/pointwise/interleaved/fromList_via_foldl_insert ... took 9.429 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList_via_foldl_insert
time                 148.0 ms   (136.0 ms .. 155.3 ms)
                     0.994 R²   (0.985 R² .. 0.999 R²)
mean                 165.5 ms   (158.6 ms .. 173.5 ms)
std dev              12.84 ms   (9.967 ms .. 18.18 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked 2^20/pointwise/sparse/fromList
time                 30.37 ms   (28.97 ms .. 31.55 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 29.69 ms   (28.51 ms .. 30.51 ms)
std dev              2.073 ms   (1.329 ms .. 2.924 ms)
variance introduced by outliers: 25% (moderately inflated)

benchmarking 2^20/pointwise/sparse/fromList_via_foldl_insert ... took 10.16 s, total 56 iterations
benchmarked 2^20/pointwise/sparse/fromList_via_foldl_insert
time                 171.2 ms   (169.8 ms .. 172.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 174.0 ms   (173.1 ms .. 175.4 ms)
std dev              1.874 ms   (1.107 ms .. 2.941 ms)

benchmarking 2^20/pointwise/random/fromList ... took 52.66 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList
time                 972.0 ms   (961.2 ms .. 986.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 918.6 ms   (886.5 ms .. 941.2 ms)
std dev              45.42 ms   (24.70 ms .. 63.02 ms)

benchmarking 2^20/pointwise/random/fromList_via_foldl_insert ... took 51.92 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList_via_foldl_insert
time                 980.2 ms   (948.5 ms .. 1.005 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 897.7 ms   (859.0 ms .. 925.6 ms)
std dev              53.87 ms   (37.18 ms .. 68.48 ms)
variance introduced by outliers: 18% (moderately inflated)



jwaldmann avatar Jul 06 '19 18:07 jwaldmann

30 million:

benchmarking 2^25/pointwise/contiguous/fromList ... took 245.4 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList
time                 4.108 s    (3.742 s .. 4.314 s)
                     0.992 R²   (0.978 R² .. 0.999 R²)
mean                 3.903 s    (3.583 s .. 4.124 s)
std dev              462.4 ms   (245.2 ms .. 738.7 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking 2^25/pointwise/contiguous/fromList_via_foldl_insert ... took 477.9 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList_via_foldl_insert
time                 8.201 s    (8.084 s .. 8.314 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.336 s    (8.229 s .. 8.582 s)
std dev              250.5 ms   (104.9 ms .. 427.0 ms)

benchmarking 2^25/pointwise/interleaved/fromList ... took 540.3 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList
time                 9.415 s    (9.263 s .. 9.603 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 9.467 s    (9.364 s .. 9.731 s)
std dev              256.9 ms   (96.10 ms .. 435.1 ms)

benchmarking 2^25/pointwise/interleaved/fromList_via_foldl_insert ... took 501.8 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList_via_foldl_insert
time                 8.928 s    (8.745 s .. 9.141 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.585 s    (8.200 s .. 8.792 s)
std dev              436.2 ms   (234.8 ms .. 657.1 ms)

benchmarking 2^25/pointwise/sparse/fromList ... took 275.4 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList
time                 4.879 s    (4.526 s .. 5.229 s)
                     0.995 R²   (0.989 R² .. 0.999 R²)
mean                 4.365 s    (3.827 s .. 4.666 s)
std dev              651.5 ms   (326.7 ms .. 969.3 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking 2^25/pointwise/sparse/fromList_via_foldl_insert ... took 486.0 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList_via_foldl_insert
time                 8.918 s    (8.732 s .. 9.150 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.227 s    (7.912 s .. 8.492 s)
std dev              488.9 ms   (332.2 ms .. 682.6 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking 2^25/pointwise/random/fromList ... took 4490 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList
time                 81.35 s    (80.35 s .. 82.21 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 80.59 s    (79.77 s .. 81.26 s)
std dev              1.309 s    (927.2 ms .. 1.838 s)

benchmarking 2^25/pointwise/random/fromList_via_foldl_insert ... took 4491 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList_via_foldl_insert
time                 81.46 s    (80.88 s .. 82.66 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 80.64 s    (79.80 s .. 81.12 s)
std dev              1.050 s    (533.6 ms .. 1.686 s)

jwaldmann avatar Jul 06 '19 18:07 jwaldmann

It really looks like the effects we're seeing are real. The new fromList is much better when the keys are contiguous, strictly increasing, or presumably clustered in general. It's about the same for random keys, and it's significantly but not horribly worse in the interleaved case. I'm convinced it's good enough to use. But I'd be a bit happier if we could reduce the regression in the interleaved case. Any ideas for doing so? fromDistinctAscList passes a bit of extra information around to avoid extra pattern matching on the subtree; can we do something similar? Might it be worth passing something extra back in Inserted?

treeowl avatar Jul 06 '19 18:07 treeowl

Perhaps it makes sense to expose the old version as fromListInterleaved or similar.

3noch avatar Jul 06 '19 20:07 3noch

@3noch I'm not generally a big fan of cluttering the API with one or two line functions, especially when anyone likely to need them will almost certainly know how to write them. fromListNaive is nothing more than foldl' (flip (uncurry insert)) empty, and fromListNaiveWithKey isn't much more.

treeowl avatar Jul 06 '19 23:07 treeowl

Perhaps it makes sense then to simply describe where fromList is less ideal performance wise and show the better performing version in the docs for that use case.

3noch avatar Jul 07 '19 01:07 3noch

@3noch, yes, definitely. I'm just still hoping there's a way to cut down on the regression....

treeowl avatar Jul 07 '19 01:07 treeowl

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks. (Functions: fromList is the current implementation in Data.IntMap; fromList1 is the version from this pull request; fromList2 is using the modified Inserted' type, and fromList3 did some manual inlining that didn't pay off at all.)

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

int-e avatar Jul 07 '19 23:07 int-e

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever.... Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

treeowl avatar Jul 07 '19 23:07 treeowl

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

Yes, there is certainly some inherent arithmetic cost to that. All we can hope to do is minimize additional costs from allocation and pattern matching.

treeowl avatar Jul 08 '19 00:07 treeowl

Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

No I didn't try that... it's not directly applicable. The point with fromAscList is that we know that keys will never be inserted into subtrees we've already built, so we can summarize the current state into a current prefix (expressed by the combination of a key and a mask). Here, we're always potentially inserting into an existing tree, so I believe it's unavoidable to inspect the actual tree we built and there we will find the prefix (expressed as prefix and mask) anyway.

P.S. the term "prefix" is hopelessly overloaded in this context... a) on a high level, a prefix is just a finite bit string that may be shorter than a word. b) in IntMap, it's also used for a key with the lower bits masked. c) in IntSet there's a split into a "prefix" and a "bitmask"...

int-e avatar Jul 08 '19 00:07 int-e

I'm not so sure. In the bad case, we call insertMany on a tree that doesn't fit the next key. So if we keep enough information around to determine that without inspecting the tree, then maybe we can win something.

treeowl avatar Jul 08 '19 00:07 treeowl

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever....

IIRC GHC's "enter" operation on the STG level actually takes several continuations for multi-constructor datatypes. So the Inserted' values should never be allocated. But I'm not sure, and I was not at all convinced that this idea would pay off when I started to implement it.

int-e avatar Jul 08 '19 00:07 int-e

insertMany' Nil k x kxs' = InsertedNil Nil

doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

On Sun, Jul 7, 2019, 7:41 PM Bertram Felgenhauer [email protected] wrote:

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a = InsertedNil !(IntMap a) | InsertedCons !(IntMap a) !Key a ![(Key,a)]

See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks.

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/pull/653?email_source=notifications&email_token=AAOOF7JT3564HU25PV7FVJ3P6J5KPA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLVJAA#issuecomment-509039744, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7OLXSXZJOZI4QV7TMTP6J5KPANCNFSM4H5BDIMQ .

treeowl avatar Jul 08 '19 00:07 treeowl

insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

I changed that, but it had no effect on the running time.

int-e avatar Jul 08 '19 00:07 int-e

Probably got inlined away.... Did you check the CSE between link and branchMask calls? If I'm not mistaken, link makes essentially the same branchMask call you do, but with the arguments flipped.

On Sun, Jul 7, 2019, 8:40 PM Bertram Felgenhauer [email protected] wrote:

insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

I changed that, but it had no effect on the running time.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/pull/653?email_source=notifications&email_token=AAOOF7IAX3MVZCWFH2J2J5TP6KEGNA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLWLBY#issuecomment-509044103, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOOF7OKYZQ6NVWA7BIBSDDP6KEGNANCNFSM4H5BDIMQ .

treeowl avatar Jul 08 '19 00:07 treeowl

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

Btw, we should also consider that the resulting speedup is miniscule and the code becomes more difficult to read as a result.

int-e avatar Jul 08 '19 01:07 int-e

Probably got inlined away.... Did you check the CSE between link and branchMask calls? If I'm not mistaken, link makes essentially the same branchMask call you do, but with the arguments flipped.

No I had not checked that... and no, it did not do the CSE, regardless of the order of arguments to branchMask. I've now added a variant of link that takes a precomputed mask (to avoid inlining link manually), and the result does seem to improve performance a bit for short lists without hurting long ones. But there's hardly any signal here... mostly noise.

linkWithMask :: Mask -> Prefix -> IntMap a -> Prefix -> IntMap a -> IntMap a
linkWithMask m p1 t1 p2 t2
  | zero p1 m = Bin p m t1 t2
  | otherwise = Bin p m t2 t1
  where
    p = mask p1 m
{-# INLINE linkWithMask #-}
[1,1]/fromAscList1a                      mean 19.91 ns  ( +- 14.99 ps  )
[1,1]/fromAscList1a'                     mean 19.38 ns  ( +- 199.5 ps  )
[10,1]/fromAscList1a                     mean 145.6 ns  ( +- 1.948 ns  )
[10,1]/fromAscList1a'                    mean 140.1 ns  ( +- 3.528 ns  )
[-10,51791]/fromAscList1a                mean 160.2 ns  ( +- 2.175 ns  )
[-10,51791]/fromAscList1a'               mean 149.8 ns  ( +- 3.113 ns  )
[100,1]/fromAscList1a                    mean 1.549 μs  ( +- 28.80 ns  )
[100,1]/fromAscList1a'                   mean 1.444 μs  ( +- 23.56 ns  )
[-100,51791]/fromAscList1a               mean 1.614 μs  ( +- 48.31 ns  )
[-100,51791]/fromAscList1a'              mean 1.454 μs  ( +- 46.67 ns  )
[1000,1]/fromAscList1a                   mean 16.24 μs  ( +- 183.1 ns  )
[1000,1]/fromAscList1a'                  mean 15.57 μs  ( +- 167.7 ns  )
[-1000,51791]/fromAscList1a              mean 17.01 μs  ( +- 492.8 ns  )
[-1000,51791]/fromAscList1a'             mean 16.07 μs  ( +- 466.1 ns  )
[10000,1]/fromAscList1a                  mean 232.7 μs  ( +- 2.194 μs  )
[10000,1]/fromAscList1a'                 mean 232.3 μs  ( +- 2.882 μs  )
[-10000,51791]/fromAscList1a             mean 250.8 μs  ( +- 4.353 μs  )
[-10000,51791]/fromAscList1a'            mean 246.1 μs  ( +- 5.896 μs  )
[100000,1]/fromAscList1a                 mean 9.101 ms  ( +- 227.0 μs  )
[100000,1]/fromAscList1a'                mean 9.097 ms  ( +- 240.5 μs  )
[-100000,51791]/fromAscList1a            mean 9.188 ms  ( +- 339.0 μs  )
[-100000,51791]/fromAscList1a'           mean 9.189 ms  ( +- 412.6 μs  )
[1000000,1]/fromAscList1a                mean 94.98 ms  ( +- 12.07 ms  )
[1000000,1]/fromAscList1a'               mean 96.65 ms  ( +- 12.22 ms  )
[-1000000,51791]/fromAscList1a           mean 97.83 ms  ( +- 10.69 ms  )
[-1000000,51791]/fromAscList1a'          mean 97.62 ms  ( +- 10.81 ms  )

int-e avatar Jul 08 '19 01:07 int-e

Oops, I believe we got #653 and #658 mixed up in the last 3 comments!

int-e avatar Jul 08 '19 01:07 int-e

I'm currently doing a bit of triage on the open PRs.

Could someone please summarize the current state of this PR, and possibly also sketch out what needs to be done until it can be merged?

sjakobi avatar Jul 17 '20 16:07 sjakobi

A lot going on here. I'll attempt a summary.

  • the goal is to speed up fromList (borrowing ideas from fromAscList, which builds the whole map in a bottom-up fashion without deconstructing partial maps) and in particular try to make it no worse than repeated insertions (#288)
  • we lack adequate benchmarks that test fromList under a variety of inputs with different characteristics
  • I have some improvements to the PR code in a gist that could be incorporated, and made some effort to benchmark against various types of lists
  • I believe this is stalled mainly because the code is still worse than before in some cases, and also because this development took place in parallel to improving fromAscList (#658) which was far more successful

int-e avatar Jul 18 '20 11:07 int-e

Thanks, @int-e! :)

Could the benchmarks be merged separately, so they are not lost or bit-rotted, and so they could potentially be used for other attempts at speeding up fromList?

Also, how about incorporating your code improvements into this PR? I guess the easiest way to do that would be if you'd make a PR against @treeowl's branch.

Of course it would also be nice to make progress on the original problem, but since that is currently stalled, let's try to ensure that the value created so far is saved.

sjakobi avatar Jul 18 '20 11:07 sjakobi