BenchmarkTools.jl
BenchmarkTools.jl copied to clipboard
`@btime` errors because `tune!` does not execute setup
It appears tune! does not execute setup before every run, leading to errors in certain cases. See below for a MWE:
using BenchmarkTools
function f!(x::AbstractVector)
length(x) == 2 || error("setup not correctly executed")
push!(x, randn())
end
Then @benchmarkable works:
b = @benchmarkable f!(y) setup=(y=randn(2))
run(b) # works
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 66.000 ns … 32.197 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 79.000 ns ┊ GC (median): 0.00%
Time (mean ± σ): 113.119 ns ± 446.383 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▄▅▄▃▃▄▄▄▅▅▅▃▂▁ ▂
███████████████████████████▆▇▇▆▇▆▆▅▄▆▆▅▇▄▅▅▅▄▅▄▅▄▄▆▅▅▅▅▄▄▄▄▃▄ █
66 ns Histogram: log(frequency) by time 374 ns <
Memory estimate: 48 bytes, allocs estimate: 1.
But neither @btime nor @benchmark do:
@btime f!(y) setup=(y=randn(2)) # errors
ERROR: setup not correctly executed
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] f!(x::Vector{Float64})
@ Main ~/Documents/SEA/UpdatableCholeskyFactorizations/doodles/benchmarktools_bug.jl:4
[3] var"##core#593"(y::Vector{Float64})
@ Main ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:479
[4] var"##sample#594"(__params::BenchmarkTools.Parameters)
@ Main ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:487
[5] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:160
[6] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:152
[7] #invokelatest#2
@ ./essentials.jl:708 [inlined]
[8] invokelatest
@ ./essentials.jl:706 [inlined]
[9] #lineartrial#46
@ ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:34 [inlined]
[10] lineartrial
@ ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:34 [inlined]
[11] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:250
[12] tune! (repeats 2 times)
@ ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:249 [inlined]
[13] top-level scope
@ ~/.julia/packages/BenchmarkTools/uq9zP/src/execution.jl:566
@benchmark yields a similar stack trace, leading me to believe that tune! does not call setup in subsequent runs.
I think you just want to specify evals=1, otherwise this is evaluated several times per setup.
julia> function f!(x::AbstractVector)
# length(x) == 2 || error("setup not correctly executed")
sleep(length(x)/10) # 100ms per element
push!(x, randn())
end;
julia> @btime f!(y) setup=(y=randn(2);) evals=1;
min 206.132 ms, mean 206.175 ms (6 allocations, 224 bytes)
julia> @btime f!(y) setup=(y=randn(2);) evals=10;
min 655.086 ms, mean 655.086 ms (5 allocations, 185 bytes)
julia> mean(2:11)
6.5
I've also run into an issue caused by this behavior. Managed to figure out the evals=1 solution, but it was a pretty opaque bug to track down. It can't be that uncommon to benchmark functions that destroy a required input property---perhaps it would be good to mention how to deal with them in the README Quick Start?
My example:
using LinearAlgebra
function randposdef(N)
A = randn(100, 100)
return Symmetric(A * A' + I)
end
julia> @btime cholesky!(A) setup=(A = randposdef(100)); # evals=1 needed to make it work
ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
[1] checkpositivedefinite
@ ~/lib/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/factorization.jl:18 [inlined]
[2] cholesky!(A::Symmetric{Float64, Matrix{Float64}}, ::Val{false}; check::Bool)
@ LinearAlgebra ~/lib/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/cholesky.jl:266
[3] cholesky! (repeats 2 times)
@ ~/lib/julia-1.7.2/share/julia/stdlib/v1.7/LinearAlgebra/src/cholesky.jl:265 [inlined]
[4] var"##core#423"(A::Symmetric{Float64, Matrix{Float64}})
@ Main ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:489
[5] var"##sample#424"(::Tuple{}, __params::BenchmarkTools.Parameters)
@ Main ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:497
[6] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; maxevals::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:161
[7] _lineartrial(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:153
[8] #invokelatest#2
@ ./essentials.jl:716 [inlined]
[9] invokelatest
@ ./essentials.jl:714 [inlined]
[10] #lineartrial#46
@ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:35 [inlined]
[11] lineartrial
@ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:35 [inlined]
[12] tune!(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, verbose::Bool, pad::String, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:251
[13] tune! (repeats 2 times)
@ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:250 [inlined]
[14] top-level scope
@ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:576
Actually, if you provided a setup you probably wanted to collect statistics for that particular specification, so even in cases where the benchmarking doesn't error out it could be flawed if it involves mutation and uses evals != 1. So maybe the README and docs should recommend evals=1 for any benchmark that mutates?
My mental model of what's happening inside @btime or @benchmark (As in @benchmarkable one could manually skip tuning):
- The test is set.
- Tuning happens. In tuning we sample once and do many evaluations to set the number of evaluations optimally. Since we sample once, the
setupstage happens only once. - We run the benchmark.
So if we set evals manually is makes the tuning stage meaningless, hence it is skipped which solves the issue.
In the case above, since the data is overwritten we need to apply setup per each evaluation hence we set evals=1 which means we skip tuning and in the running we do sampling and a single evaluation.
Could any developer verify that? It might be useful to write in documentation.