criterion icon indicating copy to clipboard operation
criterion copied to clipboard

first benchmark is expensive

Open jwaldmann opened this issue 2 years ago • 5 comments

I expect identical output for the three benchmarks, but the first one (consistently) needs double time. (Possibly related: #60)

import Criterion.Main

main = defaultMain
  [ bench "1"   $ nf length [1 .. 10^7 :: Int ]
  , bench "2"   $ nf length [1 .. 10^7 :: Int ]
  , bench "3"   $ nf length [1 .. 10^7 :: Int ]
  ]


$ cabal run
benchmarking 1
time                 38.37 ms   (38.26 ms .. 38.50 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.67 ms   (38.53 ms .. 38.97 ms)
std dev              387.9 μs   (201.4 μs .. 535.6 μs)

benchmarking 2
time                 20.21 ms   (20.13 ms .. 20.31 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 20.10 ms   (20.03 ms .. 20.17 ms)
std dev              175.3 μs   (144.9 μs .. 234.5 μs)

benchmarking 3
time                 20.77 ms   (20.43 ms .. 21.09 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 20.24 ms   (20.14 ms .. 20.41 ms)
std dev              306.0 μs   (225.8 μs .. 397.1 μs)

For criterion-1.6.0.0, with ghc-8.10.7 .. ghc-9.6.1

jwaldmann avatar Mar 26 '23 15:03 jwaldmann

My guess is that sharing is the problem. Do things improve if you write it like this?

main = defaultMain
  [ bench "1"   $ nf length [1 .. 10^7 + 3 :: Int ]
  , bench "2"   $ nf length [1 .. 10^7 + 2 :: Int ]
  , bench "3"   $ nf length [1 .. 10^7 + 1 :: Int ]
  ]

treeowl avatar Mar 26 '23 17:03 treeowl

no change, see #274

jwaldmann avatar Mar 26 '23 17:03 jwaldmann

Also, same behaviour when I write it like this

main = do
  let {-# noinline xs #-}
      xs = [1 .. 10^7 :: Int ]
  print $ sum xs -- to make sure it's evaluated
  defaultMain
    [ bench "1" $ nf length xs
    , bench "2" $ nf length xs
    , bench "3" $ nf length xs
    ]

jwaldmann avatar Mar 26 '23 20:03 jwaldmann

@treeowl I have the feeling that your analysis might be right. But why would your example reduce sharing? It still has 10^7 as a common subexpression, which might get shared.

@jwaldmann also, your example has the same expression nf length xs three times, couldn't it be that this common expression is shared?

(I might be totally wrong about the kind of optimizations that may be at play here.)

turion avatar Jan 27 '24 14:01 turion

I think I can reproduce, but the effect is not as strong as with your machine:

benchmarking 1
time                 44.11 ms   (43.78 ms .. 44.49 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 44.64 ms   (44.35 ms .. 45.42 ms)
std dev              897.5 μs   (361.0 μs .. 1.594 ms)

benchmarking 2
time                 39.32 ms   (38.54 ms .. 40.16 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 38.66 ms   (38.24 ms .. 39.24 ms)
std dev              1.025 ms   (750.5 μs .. 1.409 ms)

benchmarking 3
time                 38.11 ms   (37.18 ms .. 38.88 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 40.92 ms   (39.81 ms .. 42.66 ms)
std dev              2.730 ms   (1.797 ms .. 3.797 ms)
variance introduced by outliers: 25% (moderately inflated)

I changed the code a bit to try and avoid the problem:

main = defaultMain
  [ bench "1"   $ nf length [1 * 10^7 .. 2 * 10^7 :: Int ]
  , bench "2"   $ nf length [3 * 10^7 .. 4 * 10^7 :: Int ]
  , bench "3"   $ nf length [5 * 10^7 .. 6 * 10^7 :: Int ]
  ]

But still the effect is visible:

benchmarking 1
time                 44.98 ms   (44.13 ms .. 45.82 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 44.50 ms   (44.32 ms .. 44.82 ms)
std dev              456.8 μs   (272.8 μs .. 704.0 μs)

benchmarking 2
time                 38.68 ms   (37.49 ms .. 40.20 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 38.19 ms   (37.84 ms .. 38.79 ms)
std dev              956.1 μs   (507.1 μs .. 1.306 ms)

benchmarking 3
time                 36.98 ms   (35.67 ms .. 37.71 ms)
                     0.997 R²   (0.989 R² .. 1.000 R²)
mean                 38.67 ms   (37.92 ms .. 40.42 ms)
std dev              2.141 ms   (829.3 μs .. 3.583 ms)
variance introduced by outliers: 19% (moderately inflated)

So I guess my comments about sharing are incomplete.

turion avatar Jan 27 '24 14:01 turion