cargo-criterion
cargo-criterion copied to clipboard
Should cargo-criterion support baselines?
I had this idea kicking around that baselines could be replaced with alternate "timelines" (since cargo-criterion will hopefully soon support a full-history chart).
I was never very satisfied with the workflow of Criterion.rs' baselines (and others seem to agree on that - eg. critcmp largely exists to make up for deficiencies in the workflow supported by baselines). Thing is, I have no idea what sort of workflow would work better.
This will require some design work. Probably won't be available in 1.0.0.
We're currently improving our benchmarking suite at https://github.com/timberio/vector, and we figured is we'd like a way to compare the benchmarking results in the long run. It may be not exactly the same idea the baselines were designed for, but I just wanted to provide some feedback that it's something we're looking for.
This is tricky, and we're looking into correct ways to implement it. We're currently thinking about storing the bench data output from the PR in question + master
together (i.e. both branches being benched on the same system at the same time) for the long term comparison - but we didn't yet figure out the way we want to use this data.
I also find myself in the position where I need to compare seemingly arbitrary benchmarks against each other. For instance, I have a suite where the same benching code is run in different "environments" (i.e. tracing on and tracing off), and sometimes we iterate on those layers. I find myself in need to compare both two benches against each other, and also two versions of the code against each other.
I.e. the flow looks like this:
$ git checkout master
$ cargo bench
# produces:
# - env1/bench1
# - env1/bench2
# - env2/bench1
# - env2/bench2
# (to be used as base)
$ git checkout mychange
$ cargo bench
# produces:
# - env1/bench1
# - env1/bench2
# - env2/bench1
# - env2/bench2
# (to be used as new)
$ critcmp (or similar)
It would make sense to compare the bench1
across base/env1
, base/env2
, new/env1
and new/env2
, rather than how critcmp
does it currently: compares env1/bench1
across base
and new
.
Does it make sense?
I hope this feedback will be helpful. If you'd like to chat - we're at http://chat.vector.dev/
cc @jszwedko
I was never very satisfied with the workflow of Criterion.rs' baselines (and others seem to agree on that - eg. critcmp largely exists to make up for deficiencies in the workflow supported by baselines). Thing is, I have no idea what sort of workflow would work better.
So I thought I might just explain my workflow here. Seeing this issue now made me remember an email I got from you that I never responded to. :-( Sorry about that. It slipped down my inbox and I ended up forgetting about it.
I'll do my best to explain my workflow. I've been using this kind of flow for a long time.
So basically, I start off by running all the benchmarks and save their output. I usually call this master
or baseline
or something. It's what I compare all future runs with. Then I'll go and make some changes, run the benchmarks again, and maybe call them foo
, where foo
is some short descriptor related to that change. e.g., simdavx2
or something. I'll then run critcmp baseline simdavx2
to look at comparisons between them. Then I might hone in on a particular benchmark or set of benchmarks. Then I start running, e.g., cargo bench memchr/crate/shortinput -- --save-baseline simdavx2-shortinput
and try to tune it. I then use things like critcmp baseline simdavx2 simdavx2-shortinput
to see the progression of the benchmark over the different attempts. As you might imagine, things can get pretty fluid here. I might want to compare lots of different runs.
But there are other workflows too. Only being able to compare benchmarks with the same name across distinct runs is incredibly limiting. I also want to be able to compare benchmarks within runs. For example, I might have memchr/crate/shortinput
and memchr/libc/shortinput
, where the former is my implementation and the latter is something else that I'm trying to match or beat or whatever. But they have different names. With critcmp, I can just do critcmp baseline -g 'memchr/.*?/(.*)'
and it will do the correct grouping for me.
And there's also the presentation aspect:
- I want it to work easily on the CLI with minimal friction.
- When doing comparison, I want the output to be succint. One line per benchmark.
Happy to answer any questions about my workflow. It's a little hard to describe, so I'd be happy to elaborate on any unclear points or why I didn't use X feature in Criterion. (It is plausible that I didn't know about X, whatever it is. :-))