containers icon indicating copy to clipboard operation
containers copied to clipboard

IntMap should offer lookup variant for likely successful lookup

Open Boarders opened this issue 2 years ago • 19 comments

Currently lookup is defined to fail early by checking if the prefix mask agrees with the key:

lookup :: Key -> IntMap a -> Maybe a
lookup !k = go
  where
    go (Bin p m l r) | nomatch k p m = Nothing
                     | zero k m  = go l
                     | otherwise = go r
    go (Tip kx x) | k == kx   = Just x
                  | otherwise = Nothing
    go Nil = Nothing

In the original paper they note that if we expect our lookups to be successful then we should instead prioritize testing if the kth bit is zero and failing only when we reach leaves. If I understand correctly, then this would just be the following code:

lookupSucc :: Key -> IntMap a -> Maybe a
lookupSucc !k = go
 where
   go (Bin _p m l r)
                    | zero k m  = go l
                    | otherwise = go r    
   go (Tip kx x) | k == kx   = Just x
                 | otherwise = Nothing
   go Nil = Nothing

Is that correct? It seems to pass the tests in any case. This can certainly lead to large speed ups when we know we are doing many successful lookups. For example, in the lookup benchmark we get the following (this is a best case scenario as the benchmark does not test a map with misses which seems like a bad design for a benchmark):

benchmarked lookup
time                 172.2 μs   (170.3 μs .. 174.1 μs)
                    0.998 R²   (0.994 R² .. 1.000 R²)
mean                 176.6 μs   (175.3 μs .. 181.2 μs)
std dev              7.148 μs   (2.786 μs .. 15.37 μs)

benchmarked lookupSucc
time                 153.0 μs   (151.6 μs .. 154.0 μs)
                    1.000 R²   (1.000 R² .. 1.000 R²)
mean                 152.2 μs   (151.7 μs .. 152.7 μs)
std dev              1.783 μs   (1.306 μs .. 2.362 μs)

Even if this is not quite the correct implementation, containers should probably offer this sort of function to allow users to tune their code to their needs as both variants are worthwhile.

Boarders avatar Aug 16 '21 03:08 Boarders

This sounds like an excellent idea. Could you briefly raise the question of naming on the libraries list? It's even possible we'd want to call it lookup, and name the fast-fail version something else. Did you test for how bad this version is in an often-failing situation?

On Sun, Aug 15, 2021, 11:17 PM Callan McGill @.***> wrote:

Currently lookup is defined to fail early by checking if the prefix mask agrees with the key:

lookup :: Key -> IntMap a -> Maybe a lookup !k = go

where

go (Bin p m l r) | nomatch k p m = Nothing

                 | zero k m  = go l

                 | otherwise = go r

go (Tip kx x) | k == kx   = Just x

              | otherwise = Nothing

go Nil = Nothing

In the original paper they note that if we expect our lookups to be successful then we should instead prioritize testing if the kth bit is zero and failing only when we reach leaves. If I understand correctly then this would just be the following code:

lookupSucc :: Key -> IntMap a -> Maybe a

lookupSucc !k = go

where

go (Bin _p m l r)

                | zero k m  = go l

                | otherwise = go r

go (Tip kx x) | k == kx = Just x

             | otherwise = Nothing

go Nil = Nothing

Is that correct? It seems to pass the tests in any case. This can certainly lead to large speed ups when we know we are doing many successful lookups. For example, in the lookup benchmark we get the following (this is a best case scenario as the benchmark does not test a map with misses which seems like a a bad design):

benchmarked lookup

time 172.2 μs (170.3 μs .. 174.1 μs)

                0.998 R²   (0.994 R² .. 1.000 R²)

mean 176.6 μs (175.3 μs .. 181.2 μs)

std dev 7.148 μs (2.786 μs .. 15.37 μs)

benchmarked lookupSucc

time 153.0 μs (151.6 μs .. 154.0 μs)

                1.000 R²   (1.000 R² .. 1.000 R²)

mean 152.2 μs (151.7 μs .. 152.7 μs)

std dev 1.783 μs (1.306 μs .. 2.362 μs)

Even if this is not quite the correct implementation, containers should consider offering this sort of function to allow users to tune their code to their needs as both variants are worthwhile.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7IQ4E6T2EOIX6TK47TT5B7LNANCNFSM5CG35XSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

treeowl avatar Aug 16 '21 03:08 treeowl

Here is how it looks on the original benchmark, the same benchmark with 50% hits, with 90% hits and with all failures:

original fast-fail version:
benchmarked lookup
time                 183.5 μs   (182.2 μs .. 185.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 182.8 μs   (182.3 μs .. 183.3 μs)
std dev              1.673 μs   (1.292 μs .. 2.405 μs)

benchmarked lookup_half
time                 101.4 μs   (100.4 μs .. 102.9 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 100.9 μs   (100.6 μs .. 101.3 μs)
std dev              1.195 μs   (828.7 ns .. 1.764 μs)

benchmarked lookup_most
time                 169.2 μs   (168.3 μs .. 170.2 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 168.4 μs   (168.0 μs .. 168.9 μs)
std dev              1.558 μs   (1.255 μs .. 2.051 μs)

benchmarked lookup_fails
time                 23.82 μs   (23.70 μs .. 23.97 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 23.95 μs   (23.86 μs .. 24.19 μs)
std dev              447.1 ns   (176.3 ns .. 882.7 ns)
version without fast-fail
benchmarked lookup
time                 152.1 μs   (149.6 μs .. 156.0 μs)
                     0.996 R²   (0.992 R² .. 1.000 R²)
mean                 154.3 μs   (153.0 μs .. 156.3 μs)
std dev              5.418 μs   (3.935 μs .. 7.282 μs)
variance introduced by outliers: 17% (moderately inflated)

benchmarked lookup_half
time                 151.2 μs   (150.2 μs .. 152.6 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 151.7 μs   (151.2 μs .. 152.4 μs)
std dev              1.902 μs   (1.390 μs .. 2.776 μs)

benchmarked lookup_most
time                 143.9 μs   (143.4 μs .. 144.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 145.2 μs   (144.8 μs .. 145.9 μs)
std dev              1.829 μs   (1.350 μs .. 2.582 μs)

benchmarked lookup_fails
time                 150.2 μs   (149.0 μs .. 151.7 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 150.5 μs   (149.9 μs .. 151.5 μs)
std dev              2.544 μs   (1.804 μs .. 3.565 μs)

I can definitely ask about naming on the libraries list tomorrow, yes!

Boarders avatar Aug 16 '21 03:08 Boarders

I can't see any good reason that find should not be this variant since it otherwise throws an error.

Boarders avatar Aug 16 '21 03:08 Boarders

Agreed regarding find, but we need to settle the safe versions too.

On Sun, Aug 15, 2021, 11:45 PM Callan McGill @.***> wrote:

I can't see any good reason that find should not be this variant since it otherwise throws an error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/containers/issues/794#issuecomment-899191779, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7LMAHWLUILV5BEHK3TT5CCVLANCNFSM5CG35XSQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

treeowl avatar Aug 16 '21 03:08 treeowl

A name that does not suggest success is query. This could be a fast-failing alternative to lookup (which still suggests success, albeit not as strongly as find).

P.S.: In Agda, we do not use find, but lookup which gives us control over the error, e.g.: https://github.com/agda/agda/blob/c9c20bf3683f9eaf9987b80257369be2a45d452b/src/full/Agda/Utils/Graph/AdjacencyMap/Unidirectional.hs#L564

andreasabel avatar Aug 18 '21 06:08 andreasabel

findMaybe?

nomeata avatar Aug 18 '21 08:08 nomeata

Could y'all also please comment on whether you think lookup should continue to be fast-failing or not?

treeowl avatar Aug 18 '21 08:08 treeowl

Maybe lookup should get a HasCallStack constraint to distinguish call sites, keep stats on whether failure is frequent, persist that data transparently in /var/run/… and pick the right implementation based on that.

nomeata avatar Aug 18 '21 08:08 nomeata

Could y'all also please comment on whether you think lookup should continue to be fast-failing or not?

The usual dilemma: (1) some apps out there might have been tested to perform well with the present fast-failing lookup, (2) other apps might rather use lookup with success expectation and benefit from a switch to non-fast-failing. Without a field study, we wouldn't know.

In the absence of real-world data, I can only recommend the conservative path: Leave lookup as it is, but fork it into 2 new variants (e.g. findMaybe and query, or lookupShouldSucceed and lookupMayFail) that bring the issue to the attention of the user. Possibly deprecate lookup, but then it is only deprecated for IntMap, where we strive for a uniform interface for all the collections, and there, lookup does not split... (complications, sigh!).

andreasabel avatar Aug 18 '21 10:08 andreasabel

We might be overthinking this. The performance difference isn't so great that this change would be worse to some than other likely-optimizations that in the past were simply introduced in prior versions. So making a change should be fine I believe. This does not help us in deciding whether there to make that change or not. Gut feeling says yes: it makes the run time of the function more uniform in general.

nomeata avatar Aug 18 '21 13:08 nomeata

Here is how it looks on the original benchmark, the same benchmark with 50% hits, with 90% hits and with all failures:

How did you set up the “all failures” test?

I expect that an “all failures (but uniform distribution of keys)” will behave very different from “all failures (but existing keys are far away from the queries)”. In fact, I expect for usages where the keys of the failed lookups are “within” the range of the existing values, the new code is actually faster for the failed lookups too.

That makes me wonder if it is justifiable to just use new code always?

nomeata avatar Aug 23 '21 11:08 nomeata

@nomeata I did not do anything sensible. The current benchmark has keys 12..2^12 so I just shifted them all by 2^12. In general the IntMap benchmarks could do with some attention - I'll see if I can add something a bit more reasonable in this change.

Boarders avatar Aug 23 '21 13:08 Boarders

@nomeata Your suspicion was correct, if we use lookup on lots of near-misses then the new version is faster.

Boarders avatar Sep 09 '21 16:09 Boarders

@sjakobi When making these changes it would be nice to know how it impacts GHC as a real-world consumer of IntMap, do you have any advice on how to do that?

Boarders avatar Sep 09 '21 16:09 Boarders

JFTR, conversation has continued at https://github.com/haskell/containers/pull/800

nomeata avatar Sep 16 '21 09:09 nomeata

@sjakobi When making these changes it would be nice to know how it impacts GHC as a real-world consumer of IntMap, do you have any advice on how to do that?

Yes, that would be very good to check! Unfortunately I'm not aware of any good documentation on doing this – although it should definitely exist, either in the GHC wiki or even in this repo.

The basic process is that you update the libraries/containers submodule in GHC to use your branch and then build some sample modules, comparing allocations and timings against a build with an unmodified GHC. Cabal is a popular sample package to test, because it's directly available in libraries/Cabal and doesn't have additional dependencies.

I've previously done a tiny bit of testing for #340: See https://github.com/haskell/containers/pull/340#issuecomment-610400875

@doyougnu has done more testing and may have better advice: https://github.com/haskell/containers/pull/340#issuecomment-886565961

sjakobi avatar Sep 16 '21 17:09 sjakobi

comparing allocations and timings

allocations will be useless here, I expect, as (laziness aside) both functions should allocate the same (the return Just/Nothing). Also make sure to change find as well, I assume GHC uses that a lot as well. Timing is always quite noisy though. For this one, I’d consider a comparison of instruction count a good first indication, though.

nomeata avatar Sep 16 '21 17:09 nomeata

Ben Gamari gave me some ideas for what to do in order to measure a cabal build, just waiting on GHC to re-build.

Boarders avatar Sep 16 '21 17:09 Boarders

You can read through the commits in this branch to get a step-by-step guide on altering the boot libs (although I'm adding two boot libs in that branch). Instead of adding a boot lib you would just be changing its ref in $GHC_ROOT/.gitmodules and updating the submodule.

You'll want to mimic this commit: https://gitlab.haskell.org/doyougnu/ghc/-/commit/297fe09edbfab7c5a358b0485090a4ed8121f427

And these docs will be useful: https://gitlab.haskell.org/ghc/ghc/-/wikis/working-conventions/git/submodules

If you run into trouble I can test it out sometime next week or continue to offer advice here.

I'm +1 for this change and would be interested in observing of ghc reacts to this increase in lookup misses.

doyougnu avatar Sep 16 '21 17:09 doyougnu