containers icon indicating copy to clipboard operation
containers copied to clipboard

improve benchmarks for Data.IntMap

Open jwaldmann opened this issue 5 years ago • 9 comments

Benchmarks should

  • use several sets of data (currently: just one, contiguous keys [1 .. 2^12])
  • test bulk operations (union, intersection) - currently, they don't? https://github.com/haskell/containers/blob/master/containers-tests/benchmarks/IntMap.hs

NB: these bulk ops are the main reason for IntMap? if we only operate by-element, we could use hashmaps?

jwaldmann avatar Jul 04 '19 20:07 jwaldmann

Here's a potential source for inspiration: https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfal-hs-L20-L45

int-e avatar Jul 06 '19 19:07 int-e

yeah, also https://github.com/jwaldmann/containers/blob/intmap-fromList/containers-tests/benchmarks/IntMap.hs

jwaldmann avatar Jul 07 '19 00:07 jwaldmann

@jwaldmann That looks like a nice improvement! Why don't you simply make a PR?

sjakobi avatar Jul 29 '19 23:07 sjakobi

"Why don't you.." - because it's a drastic change that should be discussed first? Current benchmark:

defaultMain
        [ bench "lookup" $ whnf (lookup keys) m ...

my proposal

  defaultMain $ do
    e <- [ 10, 15 .. 25 ]
    return $ bgroup ("2^" <> show e)
      [ bulk
        [ ("contiguous/overlapping", [1..2^e], [1..2^e]) ...

jwaldmann avatar Jul 31 '19 08:07 jwaldmann

@jwaldmann I guess a PR would be the perfect platform for that discussion! :)

sjakobi avatar Jul 31 '19 10:07 sjakobi

test bulk operations (union, intersection) - currently, they don't?

These are tested in the set-operations-intmap benchmark, which also uses a variety of data sets.

gereeter avatar Dec 28 '19 09:12 gereeter

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

sjakobi avatar Aug 14 '20 12:08 sjakobi

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

Also related: https://github.com/haskell/containers/issues/652

sjakobi avatar Aug 14 '20 13:08 sjakobi

Regarding the problem of realistic inputs for fromList and friends: How about using e.g. splitmix to generate them randomly? Certainly that's not very realistic for many applications, but it adds another data point.

prettyprinter has a benchmark that is similarly based on randomly generated data: https://github.com/quchen/prettyprinter/blob/ab2c09419cca51fcc37760e71ef6861d26753e94/prettyprinter/bench/LargeOutput.hs#L173-L181

sjakobi avatar Aug 14 '20 13:08 sjakobi