benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

[FR] Automatic computation of the difference between two results

Open dgazzoni opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe. I need to benchmark a function that performs an in-place transform on an array. This transform maps variables from a certain range to a different range, which is an invalid input to the function. Therefore, it's necessary to reinitialize the array for every execution of the code to be benchmarked. However, I'm only interested in the cost of the transform function, and not the initialization functionn.

I naturally considered PauseTiming() and ResumeTiming(), but as stated in the documentation and in #179, they have high overhead, which I verified this in my scenario, and thus they're not usable at least for me. An alternative has been proposed in #1087, but I'd like to suggest another.

Describe the solution you'd like Suppose I have two functions:

void init(int in[]);
void transform(int out[], const int in[]);

And they are used/benchmarked as such:

static void BM_transform(benchmark::State &state) {
    int in[256], out[256];
    for (auto _ : state) {
        init(in);
        transform(out, in);
    }
}

BENCHMARK(BM_transform);

My suggestion would be to break this down into two functions:

static void BM_transform_init(benchmark::State &state) {
    int in[256], out[256];
    for (auto _ : state) {
        init(in);
    }
}

static void BM_transform(benchmark::State &state) {
    int in[256], out[256];
    for (auto _ : state) {
        init(in);
        transform(out, in);
    }
}

BENCHMARK(BM_transform_init);
BENCHMARK(BM_transform);

Subtracting the timings in BM_transform_init from those in BM_transform should represent the execution time of transform alone.

This can be done manually, but some degree of automation would be desirable. At a minimum, some kind of syntax that allows passing the two benchmarking functions, but reporting only the difference between them, such as:

BENCHMARK_DELTA(BM_transform_init, BM_transform);

BENCHMARK_DELTA should report a single result, corresponding to the difference in timings between the two versions.

A second alternative would be to have an API similar to a fixture, but one where SetUp and TearDown are called for each execution of the code to be benchmarked. In addition, another benchmarking run should be performed with SetUp followed directly by TearDown, without the code to be benchmarked, thus mimicking the BM_transform_init function above, and then the difference between the results should be calculated and displayed as a single result.

Describe alternatives you've considered As mentioned above, I tried PauseTiming() and ResumeTiming(), but the high overhead precludes their use in my scenario.

The manual calculation procedure for the difference, mentioned above, does work, but it requires post-processing of the results.

I understand the compare.py script could also be used for this, but again it requires some external post-processing. In my opinion, it would be better if this simple processing could be performed directly by the library.

dgazzoni avatar Jul 10 '21 16:07 dgazzoni

#1269 provided functionality with setup and teardown for each benchmark, but not each execution. anything that ran per-execution would have the same issue of overhead as Pause/Resume timing.

though, if the overhead of pause and resume timing is high, it's a sign that your transform method is likely not very expensive. have you tried running with larger arrays? doing more work will reduce the noise in the benchmark results too, even without the pause/resume overhead.

dmah42 avatar Jan 13 '22 15:01 dmah42