PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks
Simply replacing @benchmarkable with @be suffices, and you don't have to run tune! or run either!
Even running Statistics.median(suite) works - although any custom plotting utilities might need a couple of tweaks :)
What? I had not idea. This is lovely :)
Is there a way to take a BenchmarkGroup and "translate" it somehow? Or otherwise run a benchmark suite using Chairmarks instead of BenchmarkTools? Many projects have historically defined their benchmark suite in benchmarks/benchmarks.jl and use BenchmarkTools. So to measure performance over time it is perhaps not practical to replace all previous @benchmarkable (especially if there are setup=... in the macro).
I ask specifically in the context of AirspeedVelocity.jl: https://github.com/MilesCranmer/AirspeedVelocity.jl/issues/35
I mean they both have the same syntax, so an operation at all leaf nodes of that dict could easily reassign the @benchmarkables to the results of @be on that same Expr (if you can directly invoke a macro on an Expr).
I think they have slightly different syntax, no?
Positional argument disambiguation
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:
@be f
@be setup f
@be setup f teardown
@be init setup f teardown
whereas @benchmarkable is
help?> @benchmarkable
@benchmarkable <expr to benchmark> [setup=<setup expr>]
Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.
Ah that's true - I had not used setup much but I guess it could also be translated by way of Expr rewriting
Wait, I'm a bit confused. @benchmarkable returns a benchmark (that you can then execute). Whereas @be appears to actually return the results of benchmarking the expression. Is that correct?
Yep! Creating a benchmark suite with @be instead of @benchmarkable gives you the equivalent of that suite after run.
Oh, but isn't the whole point of @benchmarkable for it to be lazily evaluated, so you can tune! it? And @benchmark for eager evaluation?
That's true, but if running in a non-interactive framework I don't think it really matters?
Even then, the performance difference is substantial enough that it's actually possible to do semi-interactive workflows with Chairmarks.
I guess I just wouldn't consider @benchmarkable -> @be to be a complete solution (maybe this deserves a new issue with a feature request). For example if I need a single benchmark result I would use @benchmark. But I want a suite of benchmarks that I can store in my REPL, in between Revise.jl-ing my library, then the suite is something I would want to re-run. In principle it doesn't seem too bad to add a compatibility layer?
This is the benchmarkable code. The logic isn't too complicated it seems, it just generates a function `samplefunc` that returns the recorded time, allocations, memory for a particular expression.
"""
@benchmarkable <expr to benchmark> [setup=<setup expr>]
Create a `Benchmark` instance for the given expression. `@benchmarkable`
has similar syntax with `@benchmark`. See also [`@benchmark`](@ref).
"""
macro benchmarkable(args...)
core, setup, teardown, quote_vars, quote_vals, params = benchmarkable_parts(args)
map!(esc, params, params)
# extract any variable bindings shared between the core and setup expressions
setup_vars = isa(setup, Expr) ? collectvars(setup) : []
core_vars = isa(core, Expr) ? collectvars(core) : []
out_vars = filter(var -> var in setup_vars, core_vars)
# generate the benchmark definition
return quote
generate_benchmark_definition(
$__module__,
$(Expr(:quote, out_vars)),
$(Expr(:quote, setup_vars)),
$(Expr(:quote, quote_vars)),
$(esc(Expr(:tuple, Expr.(:quote, quote_vals)...))),
$(esc(Expr(:quote, core))),
$(esc(Expr(:quote, setup))),
$(esc(Expr(:quote, teardown))),
Parameters($(params...)),
)
end
end
# `eval` an expression that forcibly defines the specified benchmark at
# top-level in order to allow transfer of locally-scoped variables into
# benchmark scope.
#
# The double-underscore-prefixed variable names are not particularly hygienic - it's
# possible for them to conflict with names used in the setup or teardown expressions.
# A more robust solution would be preferable.
function generate_benchmark_definition(
eval_module, out_vars, setup_vars, quote_vars, quote_vals, core, setup, teardown, params
)
@nospecialize
corefunc = gensym("core")
samplefunc = gensym("sample")
type_vars = [gensym() for i in 1:(length(quote_vars) + length(setup_vars))]
signature = Expr(:call, corefunc, quote_vars..., setup_vars...)
signature_def = Expr(
:where,
Expr(
:call,
corefunc,
[
Expr(:(::), var, type) for
(var, type) in zip([quote_vars; setup_vars], type_vars)
]...,
),
type_vars...,
)
if length(out_vars) == 0
invocation = signature
core_body = core
elseif length(out_vars) == 1
returns = :(return $(out_vars[1]))
invocation = :($(out_vars[1]) = $(signature))
core_body = :($(core); $(returns))
else
returns = :(return $(Expr(:tuple, out_vars...)))
invocation = :($(Expr(:tuple, out_vars...)) = $(signature))
core_body = :($(core); $(returns))
end
@static if isdefined(Base, :donotdelete)
invocation = :(
let x = $invocation
Base.donotdelete(x)
x
end
)
end
return Core.eval(
eval_module,
quote
@noinline $(signature_def) = begin
$(core_body)
end
@noinline function $(samplefunc)(
$(Expr(:tuple, quote_vars...)), __params::$BenchmarkTools.Parameters
)
$(setup)
__evals = __params.evals
__gc_start = Base.gc_num()
__start_time = time_ns()
__return_val = $(invocation)
for __iter in 2:__evals
$(invocation)
end
__sample_time = time_ns() - __start_time
__gcdiff = Base.GC_Diff(Base.gc_num(), __gc_start)
$(teardown)
__time = max((__sample_time / __evals) - __params.overhead, 0.001)
__gctime = max((__gcdiff.total_time / __evals) - __params.overhead, 0.0)
__memory = Int(Base.fld(__gcdiff.allocd, __evals))
__allocs = Int(
Base.fld(
__gcdiff.malloc +
__gcdiff.realloc +
__gcdiff.poolalloc +
__gcdiff.bigalloc,
__evals,
),
)
return __time, __gctime, __memory, __allocs, __return_val
end
$BenchmarkTools.Benchmark($(samplefunc), $(quote_vals), $(params))
end,
)
en
So perhaps a ext/ChairmarksBenchmarkToolsExt.jl could create a @benchmarkable that still uses benchmarkable_parts to extract the pieces, but uses Chairmarks instead for running the thing? I'm not sure how doable this is. Maybe @LilithHafner could share their thoughts.
Hmm! Yeah that logic could probably be directly translated to Chairmarks somehow, probably by creating a new macro which stores an Expr object when run (the equivalent of @benchmarkable). It doesn't look like this is hijackable unless either we do major type piracy by overloading BenchmarkTools.generate_benchmark_definition directly, or offer a function Chairmarks.override_benchmarkable!() to @eval that code in.
This should be moved from an issue into the documentation
This PSA is in the docs now. Issues with its behavior should be reported separately. Though as I mention in the docs, this isn't intentionally designed behavior so it may never be seamless.
Thanks again for reporting, @asinghvi17, and thanks for the qualifications and context, @MilesCranmer
Ah, @MilesCranmer, I never responded to your suggestion to use @benchmarkable's API on the frontend and Chairmarks on the backend. That's possibly syntactically (i.e. to write a syntax transformation from BenchmarkTools's input format to Charimarks's input format) but there would remain semantic holes. Most notably, BenchmarkTools runs the sample function in global scope while Chairmarks runs it in local scope.