benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

Report other benchmark results relative to a baseline

Open Leandros opened this issue 6 years ago • 6 comments

Imagine the following scenario: I've build new and fancy vector<T> and want to benchmark it. I could now write the benchmarks and get some numbers, but I would have absolutely no idea how good these numbers are. For simplicity, let's assume my benchmark tests push_back, and it tests pushing an integer 1M times. My benchmark runs in 15ms. That's just an absolute number, it doesn't give me slightest idea how this fares against std::vector<T> (for example).

Therefore I'd love to see an equivalent to BENCHMARK, called BENCHMARK_RELATIVE. I would recreate the same benchmark for std::vector<T>, and use it as a baseline.

template<class Vec>
static void BM_PushBack(benchmark::State &s)
{
  Vec v;
  v.reserve(1'000'000);
  for (auto _ : s) {
    v.clear();
    for (int i = 0; i < 1'000'000; ++i)
      v.push_back(i);
  }
}

using BM_PushBackStd = BM_PushBack<std::vector<int> >;
using BM_PushBackCustom = BM_PushBack<custom::vector<int> >;

BENCHMARK(BM_PushBackStd);
BENCHMARK_RELATIVE(BM_PushBackStd, BM_PushBackCustom);

The resulting output would somewhat like this:

2018-08-22 10:46:25
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 8388K (x1)
----------------------------------------------------------------
Benchmark               Relative      Time      CPU   Iterations
----------------------------------------------------------------
BM_PushBackStd                       30 ms    30 ms           41
BM_PushBackCustom        196.23%     15 ms    15 ms           22

What relative means is up to discussion, in this case I've defined 100% as equal in relative speed, everything below 100% is slower than baseline, and everything above is faster. 196% means it's 1.69 times or 96% faster as baseline.

And credit where credit is due, this idea originally comes from folly's benchmark.h: https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md

Leandros avatar Aug 22 '18 08:08 Leandros

It can already be done via tools/compare.py filters ./a.put BM_PushBackStd BM_PushBackCustom

LebedevRI avatar Aug 22 '18 08:08 LebedevRI

Sweet. Am I missing something or can I not compare multiple benchmarks to a single baseline?

Leandros avatar Aug 22 '18 11:08 Leandros

Am I missing something or can I not compare multiple benchmarks to a single baseline?

In a single go - correct.

LebedevRI avatar Aug 22 '18 12:08 LebedevRI

To be honest I have a forked version that adds exactly that functionality (plus the possibility of comparing multiple benchmarks on a single baseline). @LebedevRI Would you suggest to try to make the pull request anyways?

ntagliani avatar Sep 06 '18 17:09 ntagliani

I would say this should wait for 'proper' json support (https://github.com/google/benchmark/pull/499, v2 branch), and then be implemented ontop of that, in on caller's side.

LebedevRI avatar Sep 06 '18 17:09 LebedevRI

v2 is some way away so i would say go for it with the comparing multiple benchmarks enhancement for the tooling.

dmah42 avatar Apr 27 '21 15:04 dmah42