criterion
criterion copied to clipboard
first benchmark is expensive
I expect identical output for the three benchmarks, but the first one (consistently) needs double time. (Possibly related: #60)
import Criterion.Main
main = defaultMain
[ bench "1" $ nf length [1 .. 10^7 :: Int ]
, bench "2" $ nf length [1 .. 10^7 :: Int ]
, bench "3" $ nf length [1 .. 10^7 :: Int ]
]
$ cabal run
benchmarking 1
time 38.37 ms (38.26 ms .. 38.50 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 38.67 ms (38.53 ms .. 38.97 ms)
std dev 387.9 μs (201.4 μs .. 535.6 μs)
benchmarking 2
time 20.21 ms (20.13 ms .. 20.31 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 20.10 ms (20.03 ms .. 20.17 ms)
std dev 175.3 μs (144.9 μs .. 234.5 μs)
benchmarking 3
time 20.77 ms (20.43 ms .. 21.09 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 20.24 ms (20.14 ms .. 20.41 ms)
std dev 306.0 μs (225.8 μs .. 397.1 μs)
For criterion-1.6.0.0, with ghc-8.10.7 .. ghc-9.6.1
My guess is that sharing is the problem. Do things improve if you write it like this?
main = defaultMain
[ bench "1" $ nf length [1 .. 10^7 + 3 :: Int ]
, bench "2" $ nf length [1 .. 10^7 + 2 :: Int ]
, bench "3" $ nf length [1 .. 10^7 + 1 :: Int ]
]
no change, see #274
Also, same behaviour when I write it like this
main = do
let {-# noinline xs #-}
xs = [1 .. 10^7 :: Int ]
print $ sum xs -- to make sure it's evaluated
defaultMain
[ bench "1" $ nf length xs
, bench "2" $ nf length xs
, bench "3" $ nf length xs
]
@treeowl I have the feeling that your analysis might be right. But why would your example reduce sharing? It still has 10^7 as a common subexpression, which might get shared.
@jwaldmann also, your example has the same expression nf length xs three times, couldn't it be that this common expression is shared?
(I might be totally wrong about the kind of optimizations that may be at play here.)
I think I can reproduce, but the effect is not as strong as with your machine:
benchmarking 1
time 44.11 ms (43.78 ms .. 44.49 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 44.64 ms (44.35 ms .. 45.42 ms)
std dev 897.5 μs (361.0 μs .. 1.594 ms)
benchmarking 2
time 39.32 ms (38.54 ms .. 40.16 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 38.66 ms (38.24 ms .. 39.24 ms)
std dev 1.025 ms (750.5 μs .. 1.409 ms)
benchmarking 3
time 38.11 ms (37.18 ms .. 38.88 ms)
0.998 R² (0.994 R² .. 0.999 R²)
mean 40.92 ms (39.81 ms .. 42.66 ms)
std dev 2.730 ms (1.797 ms .. 3.797 ms)
variance introduced by outliers: 25% (moderately inflated)
I changed the code a bit to try and avoid the problem:
main = defaultMain
[ bench "1" $ nf length [1 * 10^7 .. 2 * 10^7 :: Int ]
, bench "2" $ nf length [3 * 10^7 .. 4 * 10^7 :: Int ]
, bench "3" $ nf length [5 * 10^7 .. 6 * 10^7 :: Int ]
]
But still the effect is visible:
benchmarking 1
time 44.98 ms (44.13 ms .. 45.82 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 44.50 ms (44.32 ms .. 44.82 ms)
std dev 456.8 μs (272.8 μs .. 704.0 μs)
benchmarking 2
time 38.68 ms (37.49 ms .. 40.20 ms)
0.998 R² (0.996 R² .. 1.000 R²)
mean 38.19 ms (37.84 ms .. 38.79 ms)
std dev 956.1 μs (507.1 μs .. 1.306 ms)
benchmarking 3
time 36.98 ms (35.67 ms .. 37.71 ms)
0.997 R² (0.989 R² .. 1.000 R²)
mean 38.67 ms (37.92 ms .. 40.42 ms)
std dev 2.141 ms (829.3 μs .. 3.583 ms)
variance introduced by outliers: 19% (moderately inflated)
So I guess my comments about sharing are incomplete.