rust-playground icon indicating copy to clipboard operation
rust-playground copied to clipboard

Add support for cargo bench

Open axelf4 opened this issue 5 years ago • 6 comments

It would be great to support benchmark tests through the command cargo bench available on nightly. Possible use cases include making it easier for others to help optimize code snippets.

axelf4 avatar Oct 29 '18 13:10 axelf4

Benchmarking on a cloud service provider is inherently a flaky proposition. There's no guarantee that the underlying CPU will be at a constant speed during a single benchmark run, much less between two benchmarks run sequentially or two completely separate executions.

shepmaster avatar Oct 29 '18 14:10 shepmaster

That's true. While discrepancy between separate benchmarks doesn't matter for the intended usage, variance in proportional performance sucks. Could running the generated WebAssembly of the benchmark on the client be a viable option? I do recognize that it is infinitely more work.

axelf4 avatar Oct 29 '18 15:10 axelf4

running the generated WebAssembly of the benchmark on the client

That sounds like #374. It's certainly plausible, although I don't know enough of the particulars around what is needed (high-resolution timesource, for example) to know what availability there is.

shepmaster avatar Oct 29 '18 15:10 shepmaster

discrepancy between separate benchmarks doesn't matter for the intended usage

Wouldn't you expect to see things like

#[bench] slow_fn() { /* ... */ } 
#[bench] fast_fn() { /* ... */ }

And then people would get surprised when fast_fn is slower than slow_fn? To me, benchmarking is inherently a comparative endeavor: "did it get faster or slower than what it was".

shepmaster avatar Oct 29 '18 15:10 shepmaster

And then people would get surprised when fast_fn is slower than slow_fn? To me, benchmarking is inherently a comparative endeavor: "did it get faster or slower than what it was".

Yes, sorry I wasn't clear enough. I meant that it's important that the percentages stay the same but the actual times may differ.

That sounds like #374.

Definitely, provided that the test crate gets support.

axelf4 avatar Oct 29 '18 16:10 axelf4

Just as an idea, profiling might be more stable than running the benchmarks, but provide similar feedback. Assuming execution time and things like cache hits are unstable, the total number of assembly instructions executed for the program is a very useful metric.

I am not sure what exactly the backend to this website does, but valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes ./target/release/... runs the executable and collects this information. It could be parsed used to annotate the assembly display. Execution times would be somewhat meaningless (as with benches), but with few exceptions, a program with fewer total executed instructions will be slower than one with more. And you can easily see where operations are concentrated.

perf can do something similar but it's kind of a huge pain to get working right compared to valgrind, imo.

tgross35 avatar Jul 13 '22 00:07 tgross35