BenchmarkTools.jl
BenchmarkTools.jl copied to clipboard
tune! (and thus @benchmark) doesn't setup between evals?
The following works fine
A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
run(b)
but injecting tune! between warmup and run as in @benchmark, i.e.
A = Array(SymTridiagonal(fill(2, 5), ones(5)))
b = @benchmarkable Base.LinAlg.chol!(x) setup=(x = Hermitian(copy($A), :U))
warmup(b)
tune!(b)
run(b)
causes chol! to throw a PosDefException from within tune!
...
julia> tune!(b)
ERROR: Base.LinAlg.PosDefException(4)
in _chol!(::Array{Float64,2}, ::Type{UpperTriangular}) at ./linalg/cholesky.jl:55
in chol!(::Hermitian{Float64,Array{Float64,2}}) at ./linalg/cholesky.jl:124
in ##core#429(::Hermitian{Float64,Array{Float64,2}}) at ./<missing>:0
in ##sample#430(::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:248
in #_lineartrial#19(::Int64, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:51
in _lineartrial(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:43
in #lineartrial#20(::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:59
in #tune!#22(::Bool, ::String, ::Array{Any,1}, ::Function, ::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}, ::BenchmarkTools.Parameters) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
in tune!(::BenchmarkTools.Benchmark{Symbol("##benchmark#428")}) at /Users/sacha/.julia/v0.6/BenchmarkTools/src/execution.jl:114
which indicates that setup isn't occurring between samples in tune!. Thoughts? Thanks!
Actually, your original issue title was correct, and this is intended behavior. It's one of the weaker parts of the package, though - a tricky scenario where it's easy to run into pitfalls.
Basically, when you call tune!, you start performing multiple chol! operations on the same input matrix, which is why the error is thrown. This is because the setup and teardown phases don't get run for every evaluation, just for every sample, as mentioned in the relevant section of the manual. Otherwise, we'd have to change our execution strategy to measure the setup/teardown cost and subtract it from the final result (a process which I'm afraid would be very error prone).
The solution for situations like these is to make your core expression take long enough that only using one eval/sample is justified (I'm not sure I documented this solution..it definitely should be in the docs). I admit that it's not a great solution, it's just the only reasonable one I've thought of so far.
Note that the setup/teardown phases are executed in the sample function, which is used by tune! and the rest of the execution functions:
julia> b = @benchmarkable sin(1) setup=(println("hello"))
BenchmarkTools.Benchmark{Symbol("##benchmark#277")}(BenchmarkTools.Parameters(5.0,10000,1,0.0,true,false,0.05,0.01))
julia> BenchmarkTools.sample(b)
hello
(16.985971943887776,0.0,0,0)
julia> BenchmarkTools.tune!(b)
hello
hello
hello
⋮ # repeats many times
Interesting. The eval/sample distinction is not what tripped me up. (That setup/teardown occur only for each sample, not for each eval, I read just before posting the issue; hence I realized my terminology mistake on rereading the title/post and edited immediately.)
Rather, I assumed that tune! would default to one eval per sample at least for a first sample or two (per BenchmarkTools.DEFAULT_PARAMETERS.evals = 1), and not test with an increased number of evals per sample if the eval time was well away from the timing resolution. But I'm guessing my intuition for tune!'s implementation is completely off? :)
Thanks and best!
not test with an increased number of evals per sample if the eval time was well away from the timing resolution.
That's actually the correct intuition, but tune! isn't that clever yet. I've had it written down in my notes for a while to play around with truncating the tuning trial if the first timed sample is in the 100 microsecond or higher range. I never got around to it though.
To check my understanding, either avoiding tune! or benchmarking e.g.
N = 10
A = Hermitian(Array(SymTridiagonal(fill(2, N), ones(N)))); C = copy(A)
@benchmark (copy!($C.data, $A.data); Base.LinAlg.chol!($C))
are the right approaches for now? Thanks for the discussion (and the amazing package)!
Unfortunately, yeah. Note that if you happen to know the overhead for the extra operation (in this case, copy!), you can use the overhead keyword to subtract it from the final results automatically:
julia> @benchmark copy!($C.data, $A.data) # looks like the overhead is ~30ns
BenchmarkTools.Trial:
samples: 10000
evals/sample: 995
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 27.161 ns (0.00% GC)
median time: 27.833 ns (0.00% GC)
mean time: 27.871 ns (0.00% GC)
maximum time: 42.444 ns (0.00% GC)
julia> @benchmark (copy!($C.data, $A.data); Base.LinAlg.chol!($C)) overhead=30
BenchmarkTools.Trial:
samples: 10000
evals/sample: 8
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 64.00 bytes
allocs estimate: 3
minimum time: 3.568 μs (0.00% GC)
median time: 3.579 μs (0.00% GC)
mean time: 3.588 μs (0.00% GC)
maximum time: 11.957 μs (0.00% GC)
Perfect --- thanks again!
Do we have, or can we get, an option to force evals/sample = 1 with setup?
This is essential for every destructive benchmark. There are quite a few destructive base benchmarks that would need to be marked, at least sort!. I have the fear that some day some of the destructive base-benchmarks will get two evals and then produce pure bogus data.
Noise + overhead + no support for good benchmarks of fast inplace operations is better than production of bogus results.
Do we have, or can we get, an option to force evals/sample = 1
Yes, this has existed the entire time: see the evals keyword argument.
Yes, this has existed the entire time: see the evals keyword argument.
Thanks! I looked at BaseBenchmarks; currently, we tune! all the @benchmarkable, which appears to override even explicitly set evals. This is bad.
Is there a way to set this parameter such that it does not get tuned away? Then, I could just go through the destructive benchmarks, and fix evals = 1. Alternatively, nonempty setup could imply evals = 1 (this would, in fact, be my favorite solution).
For example, I think every single one of https://github.com/JuliaCI/BaseBenchmarks.jl/search?q=setup&unscoped_q=setup requires evals = 1. We can only hope that tune! decides on evals=1, given current CPU and testset sizes; otherwise many BaseBenchmarks have been broken since forever.
which appears to override even explicitly set evals. This is bad.
Ouch, it would definitely make sense to not override explicitly set evals.
My naive way of implementing this would be to actually initialize evals as nothing by default. Then, any benchmarkable with a non-nothing evals field gets skipped during tuning.
This also has the benefit of preventing somebody from running a benchmarkable without explicitly tuning or setting evals (usage which is basically buggy, but not currently prevented in any programmatic way).
Alternatively, nonempty setup could imply evals = 1 (this would, in fact, be my favorite solution).
Hmm, this may not be a bad idea, but seems a bit more subtle...