BenchmarkTools.jl icon indicating copy to clipboard operation
BenchmarkTools.jl copied to clipboard

tune! (and thus @benchmark) doesn't setup between evals?

Open Sacha0 opened this issue 9 years ago • 10 comments

The following works fine

A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
run(b)

but injecting tune! between warmup and run as in @benchmark, i.e.

A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
tune!(b)
run(b)

causes chol! to throw a PosDefException from within tune!

...
julia> tune!(b)
ERROR: Base.LinAlg.PosDefException(4)
 in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
 in chol!(::Hermitian{Float64,Array{Float64,2}}) at ./linalg/cholesky.jl:124
 in ##core#429(::Hermitian{Float64,Array{Float64,2}}) at ./<missing>:0
 in ##sample#430(::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:248
 in #_lineartrial#19(::Int64, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:51
 in _lineartrial(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:43
 in #lineartrial#20(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:59
 in #tune!#22(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
 in tune!(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114

which indicates that setup isn't occurring between samples in tune!. Thoughts? Thanks!

Sacha0 avatar Oct 19 '16 00:10 Sacha0

Actually, your original issue title was correct, and this is intended behavior. It's one of the weaker parts of the package, though - a tricky scenario where it's easy to run into pitfalls.

Basically, when you call tune!, you start performing multiple chol! operations on the same input matrix, which is why the error is thrown. This is because the setup and teardown phases don't get run for every evaluation, just for every sample, as mentioned in the relevant section of the manual. Otherwise, we'd have to change our execution strategy to measure the setup/teardown cost and subtract it from the final result (a process which I'm afraid would be very error prone).

The solution for situations like these is to make your core expression take long enough that only using one eval/sample is justified (I'm not sure I documented this solution..it definitely should be in the docs). I admit that it's not a great solution, it's just the only reasonable one I've thought of so far.

Note that the setup/teardown phases are executed in the sample function, which is used by tune! and the rest of the execution functions:

julia> b = @benchmarkable sin(1) setup=(println("hello"))
BenchmarkTools.Benchmark{Symbol("##benchmark#277")}(BenchmarkTools.Parameters(5.0,10000,1,0.0,true,false,0.05,0.01))

julia> BenchmarkTools.sample(b)
hello
(16.985971943887776,0.0,0,0)

julia> BenchmarkTools.tune!(b)
hello
hello
hello
⋮ # repeats many times

jrevels avatar Oct 19 '16 14:10 jrevels

Interesting. The eval/sample distinction is not what tripped me up. (That setup/teardown occur only for each sample, not for each eval, I read just before posting the issue; hence I realized my terminology mistake on rereading the title/post and edited immediately.)

Rather, I assumed that tune! would default to one eval per sample at least for a first sample or two (per BenchmarkTools.DEFAULT_PARAMETERS.evals = 1), and not test with an increased number of evals per sample if the eval time was well away from the timing resolution. But I'm guessing my intuition for tune!'s implementation is completely off? :)

Thanks and best!

Sacha0 avatar Oct 19 '16 18:10 Sacha0

not test with an increased number of evals per sample if the eval time was well away from the timing resolution.

That's actually the correct intuition, but tune! isn't that clever yet. I've had it written down in my notes for a while to play around with truncating the tuning trial if the first timed sample is in the 100 microsecond or higher range. I never got around to it though.

jrevels avatar Oct 19 '16 19:10 jrevels

To check my understanding, either avoiding tune! or benchmarking e.g.

N = 10
A = Hermitian(Array(SymTridiagonal(fill(2, N), ones(N)))); C = copy(A)
@benchmark (copy!($C.data, $A.data); Base.LinAlg.chol!($C))

are the right approaches for now? Thanks for the discussion (and the amazing package)!

Sacha0 avatar Oct 19 '16 20:10 Sacha0

Unfortunately, yeah. Note that if you happen to know the overhead for the extra operation (in this case, copy!), you can use the overhead keyword to subtract it from the final results automatically:

julia> @benchmark copy!($C.data, $A.data) # looks like the overhead is ~30ns
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     995
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     27.161 ns (0.00% GC)
  median time:      27.833 ns (0.00% GC)
  mean time:        27.871 ns (0.00% GC)
  maximum time:     42.444 ns (0.00% GC)

julia> @benchmark (copy!($C.data, $A.data); Base.LinAlg.chol!($C)) overhead=30
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     8
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  64.00 bytes
  allocs estimate:  3
  minimum time:     3.568 μs (0.00% GC)
  median time:      3.579 μs (0.00% GC)
  mean time:        3.588 μs (0.00% GC)
  maximum time:     11.957 μs (0.00% GC)

jrevels avatar Oct 20 '16 20:10 jrevels

Perfect --- thanks again!

Sacha0 avatar Oct 20 '16 20:10 Sacha0

Do we have, or can we get, an option to force evals/sample = 1 with setup?

This is essential for every destructive benchmark. There are quite a few destructive base benchmarks that would need to be marked, at least sort!. I have the fear that some day some of the destructive base-benchmarks will get two evals and then produce pure bogus data.

Noise + overhead + no support for good benchmarks of fast inplace operations is better than production of bogus results.

chethega avatar Nov 11 '18 20:11 chethega

Do we have, or can we get, an option to force evals/sample = 1

Yes, this has existed the entire time: see the evals keyword argument.

jrevels avatar Nov 12 '18 14:11 jrevels

Yes, this has existed the entire time: see the evals keyword argument.

Thanks! I looked at BaseBenchmarks; currently, we tune! all the @benchmarkable, which appears to override even explicitly set evals. This is bad.

Is there a way to set this parameter such that it does not get tuned away? Then, I could just go through the destructive benchmarks, and fix evals = 1. Alternatively, nonempty setup could imply evals = 1 (this would, in fact, be my favorite solution).

For example, I think every single one of https://github.com/JuliaCI/BaseBenchmarks.jl/search?q=setup&unscoped_q=setup requires evals = 1. We can only hope that tune! decides on evals=1, given current CPU and testset sizes; otherwise many BaseBenchmarks have been broken since forever.

chethega avatar Nov 12 '18 17:11 chethega

which appears to override even explicitly set evals. This is bad.

Ouch, it would definitely make sense to not override explicitly set evals.

My naive way of implementing this would be to actually initialize evals as nothing by default. Then, any benchmarkable with a non-nothing evals field gets skipped during tuning.

This also has the benefit of preventing somebody from running a benchmarkable without explicitly tuning or setting evals (usage which is basically buggy, but not currently prevented in any programmatic way).

Alternatively, nonempty setup could imply evals = 1 (this would, in fact, be my favorite solution).

Hmm, this may not be a bad idea, but seems a bit more subtle...

jrevels avatar Nov 13 '18 16:11 jrevels