Chairmarks.jl PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks

Simply replacing @benchmarkable with @be suffices, and you don't have to run tune! or run either!

Even running Statistics.median(suite) works - although any custom plotting utilities might need a couple of tweaks :)

Mar 06 '24 16:03 asinghvi17

What? I had not idea. This is lovely :)

Mar 07 '24 15:03 LilithHafner

Is there a way to take a BenchmarkGroup and "translate" it somehow? Or otherwise run a benchmark suite using Chairmarks instead of BenchmarkTools? Many projects have historically defined their benchmark suite in benchmarks/benchmarks.jl and use BenchmarkTools. So to measure performance over time it is perhaps not practical to replace all previous @benchmarkable (especially if there are setup=... in the macro).

I ask specifically in the context of AirspeedVelocity.jl: https://github.com/MilesCranmer/AirspeedVelocity.jl/issues/35

Mar 13 '24 17:03 MilesCranmer

I mean they both have the same syntax, so an operation at all leaf nodes of that dict could easily reassign the @benchmarkables to the results of @be on that same Expr (if you can directly invoke a macro on an Expr).

Mar 13 '24 17:03 asinghvi17

I think they have slightly different syntax, no?

  Positional argument disambiguation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:

  @be f
  @be setup f
  @be setup f teardown
  @be init setup f teardown

whereas @benchmarkable is

help?> @benchmarkable
  @benchmarkable <expr to benchmark> [setup=<setup expr>]


  Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.

Mar 13 '24 17:03 MilesCranmer

Ah that's true - I had not used setup much but I guess it could also be translated by way of Expr rewriting

Mar 13 '24 18:03 asinghvi17

Wait, I'm a bit confused. @benchmarkable returns a benchmark (that you can then execute). Whereas @be appears to actually return the results of benchmarking the expression. Is that correct?

Mar 13 '24 18:03 MilesCranmer

Yep! Creating a benchmark suite with @be instead of @benchmarkable gives you the equivalent of that suite after run.

Mar 13 '24 18:03 asinghvi17

Oh, but isn't the whole point of @benchmarkable for it to be lazily evaluated, so you can tune! it? And @benchmark for eager evaluation?

Mar 13 '24 18:03 MilesCranmer

That's true, but if running in a non-interactive framework I don't think it really matters?

Even then, the performance difference is substantial enough that it's actually possible to do semi-interactive workflows with Chairmarks.

Mar 13 '24 18:03 asinghvi17

I guess I just wouldn't consider @benchmarkable -> @be to be a complete solution (maybe this deserves a new issue with a feature request). For example if I need a single benchmark result I would use @benchmark. But I want a suite of benchmarks that I can store in my REPL, in between Revise.jl-ing my library, then the suite is something I would want to re-run. In principle it doesn't seem too bad to add a compatibility layer?

This is the benchmarkable code. The logic isn't too complicated it seems, it just generates a function `samplefunc` that returns the recorded time, allocations, memory for a particular expression.

"""
    @benchmarkable <expr to benchmark> [setup=<setup expr>]

Create a `Benchmark` instance for the given expression. `@benchmarkable`
has similar syntax with `@benchmark`. See also [`@benchmark`](@ref).
"""
macro benchmarkable(args...)
    core, setup, teardown, quote_vars, quote_vals, params = benchmarkable_parts(args)
    map!(esc, params, params)

    # extract any variable bindings shared between the core and setup expressions
    setup_vars = isa(setup, Expr) ? collectvars(setup) : []
    core_vars = isa(core, Expr) ? collectvars(core) : []
    out_vars = filter(var -> var in setup_vars, core_vars)

    # generate the benchmark definition
    return quote
        generate_benchmark_definition(
            $__module__,
            $(Expr(:quote, out_vars)),
            $(Expr(:quote, setup_vars)),
            $(Expr(:quote, quote_vars)),
            $(esc(Expr(:tuple, Expr.(:quote, quote_vals)...))),
            $(esc(Expr(:quote, core))),
            $(esc(Expr(:quote, setup))),
            $(esc(Expr(:quote, teardown))),
            Parameters($(params...)),
        )
    end
end

# `eval` an expression that forcibly defines the specified benchmark at
# top-level in order to allow transfer of locally-scoped variables into
# benchmark scope.
#
# The double-underscore-prefixed variable names are not particularly hygienic - it's
# possible for them to conflict with names used in the setup or teardown expressions.
# A more robust solution would be preferable.
function generate_benchmark_definition(
    eval_module, out_vars, setup_vars, quote_vars, quote_vals, core, setup, teardown, params
)
    @nospecialize
    corefunc = gensym("core")
    samplefunc = gensym("sample")
    type_vars = [gensym() for i in 1:(length(quote_vars) + length(setup_vars))]
    signature = Expr(:call, corefunc, quote_vars..., setup_vars...)
    signature_def = Expr(
        :where,
        Expr(
            :call,
            corefunc,
            [
                Expr(:(::), var, type) for
                (var, type) in zip([quote_vars; setup_vars], type_vars)
            ]...,
        ),
        type_vars...,
    )
    if length(out_vars) == 0
        invocation = signature
        core_body = core
    elseif length(out_vars) == 1
        returns = :(return $(out_vars[1]))
        invocation = :($(out_vars[1]) = $(signature))
        core_body = :($(core); $(returns))
    else
        returns = :(return $(Expr(:tuple, out_vars...)))
        invocation = :($(Expr(:tuple, out_vars...)) = $(signature))
        core_body = :($(core); $(returns))
    end
    @static if isdefined(Base, :donotdelete)
        invocation = :(
            let x = $invocation
                Base.donotdelete(x)
                x
            end
        )
    end
    return Core.eval(
        eval_module,
        quote
            @noinline $(signature_def) = begin
                $(core_body)
            end
            @noinline function $(samplefunc)(
                $(Expr(:tuple, quote_vars...)), __params::$BenchmarkTools.Parameters
            )
                $(setup)
                __evals = __params.evals
                __gc_start = Base.gc_num()
                __start_time = time_ns()
                __return_val = $(invocation)
                for __iter in 2:__evals
                    $(invocation)
                end
                __sample_time = time_ns() - __start_time
                __gcdiff = Base.GC_Diff(Base.gc_num(), __gc_start)
                $(teardown)
                __time = max((__sample_time / __evals) - __params.overhead, 0.001)
                __gctime = max((__gcdiff.total_time / __evals) - __params.overhead, 0.0)
                __memory = Int(Base.fld(__gcdiff.allocd, __evals))
                __allocs = Int(
                    Base.fld(
                        __gcdiff.malloc +
                        __gcdiff.realloc +
                        __gcdiff.poolalloc +
                        __gcdiff.bigalloc,
                        __evals,
                    ),
                )
                return __time, __gctime, __memory, __allocs, __return_val
            end
            $BenchmarkTools.Benchmark($(samplefunc), $(quote_vals), $(params))
        end,
    )
en

So perhaps a ext/ChairmarksBenchmarkToolsExt.jl could create a @benchmarkable that still uses benchmarkable_parts to extract the pieces, but uses Chairmarks instead for running the thing? I'm not sure how doable this is. Maybe @LilithHafner could share their thoughts.

Mar 13 '24 18:03 MilesCranmer

Hmm! Yeah that logic could probably be directly translated to Chairmarks somehow, probably by creating a new macro which stores an Expr object when run (the equivalent of @benchmarkable). It doesn't look like this is hijackable unless either we do major type piracy by overloading BenchmarkTools.generate_benchmark_definition directly, or offer a function Chairmarks.override_benchmarkable!() to @eval that code in.

Mar 13 '24 18:03 asinghvi17

This should be moved from an issue into the documentation

Nov 27 '24 21:11 LilithHafner

This PSA is in the docs now. Issues with its behavior should be reported separately. Though as I mention in the docs, this isn't intentionally designed behavior so it may never be seamless.

Dec 01 '24 15:12 LilithHafner

Thanks again for reporting, @asinghvi17, and thanks for the qualifications and context, @MilesCranmer

Dec 01 '24 15:12 LilithHafner

Ah, @MilesCranmer, I never responded to your suggestion to use @benchmarkable's API on the frontend and Chairmarks on the backend. That's possibly syntactically (i.e. to write a syntax transformation from BenchmarkTools's input format to Charimarks's input format) but there would remain semantic holes. Most notably, BenchmarkTools runs the sample function in global scope while Chairmarks runs it in local scope.

Jan 09 '25 14:01 LilithHafner