BenchmarkTools.jl
BenchmarkTools.jl copied to clipboard
Feature Request: `@benchmark f() g()`
Most of the time the reason we use BenchmarkTools is not because we want to know how fast is A but rather if A is faster than B and by how much. A very good addition in my opinion to BenchmarkTools would be a macro to compare A vs B vs… X instead us guessing if one is faster than the others based on their statistics. This macro would also allow for internal bias reduction (reloading A and B and…, etc.) and running such macro for a long time should account as well for the whole machine/OS potential bias.
And I concur. In benchmarking and optimizing a function, I often define function_old() and function_new() and check if changes to function_new() have the runtime impact I expect. In a benchmarking package, ideally I can perform that comparison correctly, easily, quickly, and precisely. A well crafted varargs @benchmark that supports @benchmark function_old() function_new() would be ideal.
This extension has the additional potential to help users like me avoid common benchmark comparison pitfalls like those discussed in the linked discourse thread
Take a look at https://juliaci.github.io/BenchmarkTools.jl/dev/manual/#Handling-benchmark-results especially judge
Perhaps this workflow is common enough to let @benchmark f() g() expand to judge(minimum(@benchmark f()), minimum(@benchmark g()))?
Additionally, is there some way to take advantage of knowing the primary goal of a benchmark is to compare 2 functions by, for example, randomly alternating samples or blocks of samples?
I rather not overcomplicate the @benchmark interface.
Additionally, is there some way to take advantage of knowing the primary goal of a benchmark is to compare 2 functions by, for example, randomly alternating samples or blocks of samples?
Hm not currently, I don't know if that would help or hurt. The branch predictor would learn that pattern.
Perhaps a documentation solution then? When I opened this issue I had already loosely read/closely skimmed the (mercifully short!) manual cover to cover, but found the benchmarkgroups and judge sections a bit intimidating and didn't put together judge(minimum(@benchmark f()), minimum(@benchmark g())) as a supported solution to my problem. Nevertheless, I got along fine with things like (@elapsed f())/(@elapsed g()) until I ran into the tangentially related issue that started this thread.
Improving the docs would be fantastic! If you have the time maybe you can take a stab at it?
Maybe there's some inspiration we can take from cargo bench and how it keeps a record of previous benchmarks? I don't know how they take care of modalities when plotting, but as far as I know there are some web pages generated for displaying plots of different runs next to each other.