Catch2 Benchmark for large object?

I'm trying to use Catch's benchmarking tools to benchmark member functions of a Very Large Object (VLO). The canonical example for what I want to do in documentation is something like:

BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) {
    std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
    meter.measure([&](int i) { storage[i].construct("thing"); });
};

However, I can't reasonably allocate memory for meter.runs() == 100 VLOs, as I run out of system memory. I was able to roughly work around this by setting --benchmark-samples 1 on the CLI, but I found that even with multiple iterations I do not get any of the useful statistical output.

Currently it seems that it is possible to provide per-iteration setup/teardown code before and after the call to meter.measure(), but there is no way to provide per-run setup/teardown. Is there any way for me to reset the state of my VLO between each run that would not be included in the meter.measure() lambda? Some sort of startTimer(), endTimer() that could be invoked within the lambda might be ideal.

Thanks!

Jan 08 '21 17:01 glennsweeney

I have a similar issue where the function under test has side-effects on objects outside the benchmark by changing their state.

An example, but not limited to, are streams:

BENCHMARK_ADVANCED("my benchmark")(Catch::Benchmark::Chronometer meter) {
  std::istringstream is{"Some input"};
  meter.measure([]{ return function_under_test(is); });
}

The function_under_test(std::istream&); reads input from the stream, thereby changing its internal position on a buffer. The next execution of function_under_test(); would no longer have the initial contents available. Unfortunately, for streams there is no clean way other than using seekg to reset itself. Even then, it would be included in the benchmark.

The primary question I'd ask is: Does Catch2 wants to support these rather uncommon use cases, or are you recommending other benchmark libraries (e.g. google benchmark) instead?

I could think of 2 solutions to solve this:

As Glenn suggested, an interface of the timer which allows the user to pause/resume, start/stop the timer. I'm not sure if that adds any overhead (e.g. capturing the current time) which has to be subtracted from the final result. That solution would offer the most flexibility, as it has the potential to cover other use cases, we couldn't think of, as well.
The second one being, the meter to accept user-defined setup and teardown functions, which are run just before or after the benchmark lambda, respectively. Alternatively, a user may provide their own class type (through a template parameter) , where the constructor and destructor are used as setup and teardown functions. The oprator()(meter?) overload of that class being the function under test (i.e. benchmark lambda). ~I think that solution can be implemented faster.~

I have no idea what or whether any of those suggested solutions have any implications on compiler optimizations or anything other.

Dec 07 '21 20:12 jdoubleu

Is there any progress on this issue? I'm having a similar problem - testing a pool allocator, that quickly consumes the entire system memory, resulting in an invalid test:

BENCHMARK_ADVANCED("Allocator::Allocate(Pool::DefaultPoolSize)") (Catch::Benchmark::Chronometer meter) {
	Allocator::CollectGarbage();
	std::vector<Allocation*> storage(meter.runs());

	// May throw on x86 builds, due to 4 GB limit
	try {
		meter.measure([&](int i) {
			return storage[i] = Allocator::Allocate(Pool::DefaultPoolSize);
		});
	} catch(const Except::Allocate&) {
		// Catch here, so that we ensure deallocation
	}

	for (auto& i : storage)
		if (i)
			Allocator::Deallocate(i);
};

In addition to the previous suggestions, it would be great if Catch::Benchmark::Benchmark actually allowed for an alternative way to compute the amount of runs() via prepare(), like, for example, manually setting them. Either that, or a more flexible approach towards exceptions while doing a Benchmark::run().

AFAIK Catch2 does a warmup to determine a number of runs, but the time a function consumes is not necessarily constrained by CPU only. In my case, it is constrained by the system memory, resulting in an exception, which invalidates the entire benchmark.

Jun 09 '22 11:06 Epixu