nonius icon indicating copy to clipboard operation
nonius copied to clipboard

Freezing on fast benchmarks

Open eupp opened this issue 9 years ago • 10 comments

I have encountered a problem after updating nonious to the latest version of branch devel. Benchmarks that supposed to run very fast cause a freezing. For example

#include <memory>

#include <nonius/nonius.h++>
#include <nonius/main.h++>

NONIUS_BENCHMARK("shared_ptr<int> default constructor", [](nonius::chronometer meter)
{
    nonius::storage_for<std::shared_ptr<int>> storage;
    meter.measure([&storage] {
        storage.construct();
    });
});

produce the following output and then freezes:

clock resolution: mean is 16.1415 ns (40960002 iterations) benchmarking shared_ptr default constructor

I guess that the bug appeared after latest commits in may/june because same benchmark had been running successfully before.

I compile it with gcc 4.9.3 on ubuntu 14.04. My machine is x64 Intel® Core™ i7-4790 CPU @ 3.60GHz × 8 with 16 Gb RAM.

eupp avatar Jul 22 '16 16:07 eupp

I am having similar problems. For me, some of the problems where fixed when adding volatile to some variables. It seems that the de-optimization of the return value is not working always properly.

Here is an example of a test that freezes, benchmarking the performance of doing random access on a vector of 1000 elements.

constexpr auto benchmark_size = 1000u;

auto make_generator(std::size_t runs)
{
    auto engine = std::default_random_engine{42};
    auto dist = std::uniform_int_distribution<unsigned>{0, benchmark_size};
    auto r = std::vector<std::size_t>(runs);
    std::generate_n(r.begin(), runs, std::bind(dist, engine));
    return r;
}

NONIUS_BENCHMARK("std::vector", [] (nonius::chronometer meter)
{
    auto g = make_generator(benchmark_size);
    auto v = std::vector<unsigned>(benchmark_size);
    std::iota(v.begin(), v.end(), 0u);
    meter.measure([&] {
        auto r = 0u;
        for (auto i = 0u; i < benchmark_size; ++i)
            r += v[g[i]];
        return r;
    });
})

The benchmark works as soon as I add a volatile to the result variable in the measurement, as in:

NONIUS_BENCHMARK("std::vector", [] (nonius::chronometer meter)
{
    auto g = make_generator(benchmark_size);
    auto v = std::vector<unsigned>(benchmark_size);
    std::iota(v.begin(), v.end(), 0u);
    meter.measure([&] {
        volatile auto r = 0u; // <-- volatile added here!
        for (auto i = 0u; i < benchmark_size; ++i)
            r += v[g[i]];
        return r;
    });
})

arximboldi avatar Jul 30 '16 11:07 arximboldi

I also get "freezes", in that the benchmark takes very long to complete. For taking 10 samples, that each take 70ms, Nonius takes 20 seconds to complete. With the CSV exporter and defaullt exporter. With only a single iteration per sample. With 100 samples, it never appears to finish.

RealLitb avatar Dec 06 '17 18:12 RealLitb

@RealLitb can you post the code of your benchmark? I think this problem manifests normally independently of the number of samples. I wonder if there might some cross-sample interactions in your benchmark that causes each sample to grow in time consumed. A way to verify this would be to try different number of samples (with low numbers) and see if the measurements are similar.

arximboldi avatar Dec 06 '17 21:12 arximboldi

@arximboldi I rechecked, and it actually happens in only one benchmark, which consists of several digital lowpass filters.

I have no idea yet what could be causing this. I run on ARM Cortex-A9. Using less memory than in my other tests. It doesn't cause issues with the other three nonius binaries. But with this one, it does. Unfortunately the GDB on the target system immediately crashes when attempting to debug it.. I will try to debug it with QEMU on my build host later.

I have attached the main code. Maybe you can spot a mistake right away


#include <flicker/flicker.h>
#include <flicker/flicker.cc> // yes.. to get the implementations
#include <cmath>

#if SPEED_TEST
#define NONIUS_RUNNER
#include <nonius/nonius_single.h++>
#endif

void generateModulatedHrms(float *values, int N, float dc, float A, float f) {
   float omega = 2 * M_PI * f;
   for(int i = 0; i < N; i++) {
      values[i] = dc + A * sin(omega * i);
   }
}

// just a big-enough value
static const int N = 500000;

template<typename Measurer>
void test(Measurer measurer) {
   // simulate with 150 kHz sample rate
   const float sampleRate = 150000.f;
   const float netFrequency = 50;

   static float values[N];
   generateModulatedHrms(values, N, 230, 20, sampleRate / netFrequency);

   mst::flicker::ActiveConfig<mst::flicker::FlickerFloatTraits> config;
   config.netFrequency = netFrequency;
   config.netVoltage = 230;
   config.sampleRateInHz = sampleRate;

#if MOD_REENTRANT
   flicker_implementation_float state;
#endif
   // then add all samples. only this part is measured by nonius
   measurer([&]{
#if MOD_REENTRANT
      flicker_normalized_float_r(true, config, 0, &state);
#else
      flicker_normalized_float(true, config, 0);
#endif

      for(int i = 0; i < N; i++) {
#if MOD_REENTRANT
         flicker_normalized_float_r(false, config, values[i], &state);
#else
         flicker_normalized_float(false, config, values[i]);
#endif
      }
   });
}

#if SPEED_TEST
NONIUS_BENCHMARK("Benchmark", [](nonius::chronometer meter){
   test([&](auto &&f){meter.measure(f);});
})
#else
int main() {
   test([](auto &&f){f();});
}
#endif

RealLitb avatar Dec 07 '17 08:12 RealLitb

@arximboldi subsequently I debugged the program and it turns out that Nonius executes the entire benchmark not only twice, as mentioned in the Manual, but multiple times. Breaking on the line after "generateModulatedHrms(", it shows it hits that breakpoint as often as the line directly after "flicker_normalized_float", which is the actual function passed to the chronometer's measure function.

If I understand correctly, it's supposed to execute the whole benchmark a single time when measuring the accuracy of the clock, and another time when doing the actual measurements. And to execute the function passed to "meter.measure" 100 times (number passed to "--samples") each time. But it seems to execute the complete benchmark 100 times!

RealLitb avatar Dec 07 '17 09:12 RealLitb

If I understand correctly, it's supposed to execute the whole benchmark a single time when measuring the accuracy of the clock, and another time when doing the actual measurements. And to execute the function passed to "meter.measure" 100 times (number passed to "--samples") each time. But it seems to execute the complete benchmark 100 times!

No, that is not how it works.

Even one sample may execute the benchmark multiple times. The idea is that if your benchmark is very fast, relatively close to the "error" of the clock measurements themselves, then you won't get useful results.

For this reason, the first thing Nonius does is to measure the clock accuracy. Then it will run the benchmark multiple times, just to figure how many times does it need to run it such the total of these runs take enough time to be significantly bigger than the clock error (a number K of runs or sub-measurements) [1]

Then in the actual measurements, for each sample, it will run your benchmarks K times.

You can receive an int parameter in your benchmark function, where Nonius will tell you which of these sub-measuremes is currently running. This allows you to write the benchmark in such a way that de-couples these sub-measurements. Most of the time this involves keeping the state in a vector of K elements. I think there are a few examples in the documentation.

Cheers!

JP

[1] This is just lay-man understanding of it. There are actually some advanced statistical bootstraping involved that is beyond my knowledge. Ask @rmartinho for details.

-- ∿∿∿ https://sinusoid.al ∿∿∿ https://sinusoid.es

arximboldi avatar Dec 07 '17 11:12 arximboldi

@arximboldi hmm, that is not what the docs are saying, though. They say

"There are two types of callable objects that can be provided. The simplest ones take no arguments and just run the user code that needs to be measured. However, if the callable can be called with a nonius::chronometer argument, some advanced features are available. The simple callables are invoked once per run, while the advanced callables are invoked exactly twice: once during the estimation phase, and another time during the execution phase."

The benchmark has a "chronometer" parameter. It can't be called with an "int" argument. The int parameter is given to the measurement-function. Citing the doc again:

"A single call to nonius::chronometer::measure performs the actual measurements by invoking the callable object passed in as many times as necessary. Anything that needs to be done outside the measurement can be done outside the call to measure. [...] If it accepts an int parameter, the sequence number of each run will be passed in, starting with 0."

I'm a bit confused now as to the terms. What you seem to refer to as "benchmark" is what the docs refer to as "callable object of the measurement function". And what the docs refer to as "benchmark" does not appear to be contained in your reply. The docs refer to a benchmark function that is called "exactly twice", when using the advanced form, as in my case.

RealLitb avatar Dec 07 '17 11:12 RealLitb

I understand that " performs the actual measurements by invoking the callable object passed in as many times as necessary" refers to what you described as "how many times does it need to run it such the total of these runs take enough time to be significantly bigger than the clock error". But the docs say that this runs the callable passed to measure. Not the callable passed-in as the benchmark.

RealLitb avatar Dec 07 '17 11:12 RealLitb

Yes, you are correct, sorry for the confusion. What I refer to as benchmark is the measurement function. I did not mention the other one. The real benchmark function, this is, the one mentioned in the docs, is executed once per-sample. Then the mesurement function is executed K times per sample.

If you want to generate an expensive to build independent fixture there is a way you can do so by wrapping your function in one taking nonius::parameters object. This allows you to parametrize your tests such that you can pass parameters from the command line. The parametrized function is evaluated once whenever Nonius configures the parameters and it should return a benchmark function. As in:

NONIUS_BENCHMARK("example", [](nonius::parameters params)
{
    // this will be executed once per configuration
    return [=](nonius::chronometer meter) {
        // this will be executed once per sample
        meter.measure([&] (int r) {
            // this will be executed executed once per measurement
            // (K times per sample)
        });
    };
})

arximboldi avatar Dec 07 '17 11:12 arximboldi

Ah thanks, that solves the confusion. I suggest clarifying it in the docs. Like rephrasing the sentence as "The simple callables are invoked once per run, while the advanced callaback is invoked only once if it accepts a nonius::parameters object, or once for each sample and multiple times during the estimation phase if it accepts a nonius::chronometer object." - if i understood it correctly.

RealLitb avatar Dec 07 '17 12:12 RealLitb