This is the big feature I've been working on. I would link you to the branch, but my modem is broken and I forgot to bring the code/laptop to work. Which also means that I cannot currently show you pretty pictures to help make the case for this feature.

What I can do is give a quick overview of the design decisions and effects of this feature.

Effects

Generate images (a bar chart with error bars) 📊
Extra code to maintain 🐞
Dependency on gnuplot for the plotting feature 😕 (couldn't find a pure Rust lib, so I'm piping to whatever is called gnuplot on the PATH)

Design decisions

Currently produces one plot per test name since different benches can have very different values. But maybe a big plot is also nice? Do we give that as an option then, or do we always generate it?
Where to put the plots. Right now it defaults to target/benchcmp. Should it be configurable? What about the issue where you call benchcmp from a subdirectory of your project? AFAIK we can't get the project root from an environment variable.. Do we go dirty and poll the directory structure to find the Cargo.toml file?
Command line interface: my original plan was to put the original stuff under a subcommand table and this new stuff under subcommand plot, but now I'm thinking perhaps it's better to provide a separate cargo-benchplot executable?

Jul 25 '16 12:07 Apanatshka

Hmmmmmmmm. I'm not as excited about this one, honestly. Not because I think it isn't useful, but because it's a large increase in functionality that I'm not sure I'll be able to maintain. The dependency on gnuplot is also really unfortunate and would, for example, make this tool more difficult to use in other environments like Windows.

I'm also not keen on an N-way comparison. I feel like a much simpler approach is to keep to a pairwise comparison, but permit the inputs to contain more than one sample of each benchmark. (This is how I envision extracting p-values, for example.)

With that said, plotting stuff is cool, and I know it's something I'm going to want to do eventually. I would be more in favor of adding something simpler to cargo-benchcmp that facilitates plotting (e.g., emitting something that can be piped to gnuplot), but I'm not sure what the details might look like. This also makes some of your design challenges moot (like where to put images and such) by putting more control in the user's hands. That way, cargo-benchcmp sticks to being a tool that primarily does simple transformations on micro benchmark results, which is something I feel comfortable maintaining.

Another possibility is pushing the benchmark parsing/comparison code into a library, and letting others develop tools on top of that. But that feel likes a monstrous amount of overkill...

Jul 25 '16 12:07 BurntSushi

Yeah, I don't like the dependency on gnuplot either. But at least there are windows binaries for gnuplot.

The reason I even came up with was because I needed an overview of how different implementation techniques compared for some benchmarks. Although pair-wise comparison is fine to find the best implementation for one benchmark, it's easier to click through some images to get an impression of performance on a benchmark suite.

I now know a little bit more about plotting with gnuplot so I could see how we might go about plotting other things such as a progression of a benchmark over time.

Just generating the gnuplot scripts and leaving the calls to gnuplot to the user is certainly a possibility. But I'm not sure the design challenges will be moot, that has to do with details of gnuplot. I'll write more on that tonight or tomorrow, this is taking too much time/concentration and I should be working ^^'

I don't know exactly how much work it would be to split off a library, but that was one of the things that came to mind as I wrote this issue. It really all depends on how far we want to take this. I think splitting off a library is something we can always do later when it's a more obvious choice though. When you have good enough testability, things will be naturally organised in way that facilitates splitting up.

Jul 25 '16 12:07 Apanatshka

Ok, so more info on the current implementation. I pushed the working code, so you can try out this branch. Usage:

Compares Rust micro-benchmark results.
Usage:
    cargo benchcmp table [options] <file>...
    cargo benchcmp table [options] --by-module <name> <name> <file>...
    cargo benchcmp plot  [options] <file>...
    cargo benchcmp -h | --help
Modes:
    table               outputs a table that compares benchmark results
    table --by-module   takes two extra arguments for the module names
                        compares benchmarks between the two modules
    plot                takes one or more files, and plots a bar-chart for
                        every bench-test it can find multiple of
General options:
    -h, --help          show this help message and exit
    --no-color          suppress coloring of improvements/regressions
    --strip-names <re>  a regex to strip from benchmarks' names
Comparison options:
    --by-module         take two module names before the files and compare
                        those
    --output <file>     write to file instead of stdout
    --variance          show variance
    --threshold <n>     only show comparisons with a percentage change greater
                        than this threshold
    --regressions       show only regressions
    --improvements      show only improvements
Plot command options (requires gnuplot):
    --by <cmp>          plot benchmarks by file or module [default: module]
    --format <fmt>      output formats: eps, svg, pdf, png [default: png]

Running cargo benchcmp plot aho-corasick, where aho-corasick is the output of benchmarking that crate, gets you the output:

Writing 14 plots to target/benchcmp

along with warnings about tests that are only in one module. Here is one of the generated plots:

ac_ten_one_prefix_byte_every_match

The generated plot script looks something like this:

set terminal png noenchanced
set output 'target/benchcmp/ac_ten_one_prefix_byte_every_match'
set title 'ac_ten_one_prefix_byte_every_match'
set ylabel 'ns/iter'
set boxwidth 0.9
set style data histograms
set style fill solid 1.0
set bars fullwidth
set style fill solid border -1
set style histogram errorbars gap 2 lw 1
unset xtics
set xrange [0.33:0.83]
set ytics border mirror norotate
set yrange [0:1544127e-1]
plot '-' binary endian=little record=1 format='%uint64' using 1:2 title 'dense', '-' binary endian=little record=1 format='%uint64' using 1:2 title 'dense_boxed', '-' binary endian=little record=1 format='%uint64' using 1:2 title 'full', '-' binary endian=little record=1 format='%uint64' using 1:2 title 'full_overlap', '-' binary endian=little record=1 format='%uint64' using 1:2 title 'sparse'

After that comes the data (as raw u64 bytes in little endian order), but that's not super interesting I guess.

I think I can figure something out with a gnuplot script that can be called by gnuplot with a command line flag so that you can set the file or directory yourself. But as you can see you normally need to set the output file in the script.

Jul 26 '16 09:07 Apanatshka

cargo-benchcmp
cargo-benchcmp copied to clipboard

N-way comparison with plots

Effects

Design decisions

cargo-benchcmp cargo-benchcmp copied to clipboard

N-way comparison with plots

Effects

Design decisions

cargo-benchcmp
cargo-benchcmp copied to clipboard