benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

Support for minimum iteration count.

Open AdamBrouwersHarries opened this issue 6 years ago • 22 comments

At present, it is not possible to specify a minimum number of iterations for a benchmark. This is an issue when the framework decides that a given configuration for a benchmark should only be run a small number of times, fewer than we would like for the benchmark to be stable. The only way to modify the number of iterations is by explicitly specifying Iterations(...) for a given benchmark, or by passing --benchmark_min_time=... to run the benchmarks for a specific length of time.

One possible solution would be a corresponding --benchmark_min_iterations=... flag that could be passed at runtime, which would prevent the framework from running the benchmark for any fewer than N iterations, where N is some number by which we know that the result should be stable.

AdamBrouwersHarries avatar Dec 17 '18 17:12 AdamBrouwersHarries

We used to have a way to control the minimum iterations directly, but it turns out that it is easy to misconfigure. We found that every use-case could be covered by minimum time or direct iteration settings and having both caused quite a bit of extra complexity in the framework. So we removed it.

I'm not entirely convinced that we need to put it back, as you note that both the minimum time and Iterations() calls can be used.

Can you give specific examples where the stability of a benchmark is predictable a priori and Iterations() doesn't do what you need?

On Mon, Dec 17, 2018 at 5:44 PM Adam Harries [email protected] wrote:

At present, it is not possible to specify a minimum number of iterations for a benchmark. This is an issue when the framework decides that a given configuration for a benchmark should only be run a small number of times, fewer than we would like for the benchmark to be stable. The only way to modify the number of iterations is by explicitly specifying Iterations(...) for a given benchmark, or by passing --benchmark_min_time=... to run the benchmarks for a specific length of time.

One possible solution would be a corresponding --benchmark_min_iterations=... flag that could be passed at runtime, which would prevent the framework from running the benchmark for any fewer than N iterations, where N is some number by which we know that the result should be stable.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/benchmark/issues/746, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfIMmJ9YzkKP8__aowLYP_8swzZEF4xks5u59gUgaJpZM4ZW1PW .

dmah42 avatar Dec 18 '18 10:12 dmah42

I think I might have miscommunication somewhat - I meant to say that both "minimum time", and "iterations" are insufficient for what I'm trying to do. I'll try and give an example.

I have a number of benchmarks where we want to evaluate them across a range of values. For example, an operation F on N*M matrices, where we want to measure performance across various values of N and M, where they might range between [64, 2048] (say). It should be unsurprising that F is fairly fast when N and M are small, and quite slow when N and M are large.

My issue is that googlebench (imho, incorrectly - but I'll explain further) calculates that fewer iterations are required when N and M are large, as the time taken by F seems more stable. This, unfortunately, isn't always the case, as we sometimes work with quite temperamental hardware that can have quite wide performance ranges for F (even when N and M are large).

The two existing solutions benchmark_min_time and Iterations() don't really address this case I don't think:

  • Iterations() sets an absolute number of iterations that all configurations of parameters must run for. This number might be too large for longer running configurations of F (e.g. when N and M are large), causing our benchmarks to run for a long time, and might also be too small for faster configurations, causing us to gather insufficiently statistically significant data.
  • benchmark_min_time sets a minimum length of time that all configurations must run for. This would solve the problem for large values of N and M, but when N and M are small, this imposes a somewhat unacceptable overhead, as the faster benchmarks currently finish quite quickly.

As far as I can tell, having a benchmark_min_iterations parameter is the only possible way to get around these limitations:

  • For small values of N and M, the number of iterations is unaffected, as they would already perform more iterations in order to get statistically significant data.
  • For larger values of N and M, we increase the number of iterations without reducing the number of iterations for other configurations, and without making other configurations take just as long to run.

I hope I've explained my use case a bit more clearly? If you have any other thoughts or potential solutions, they would also be very welcome.

Cheers, Adam

AdamBrouwersHarries avatar Dec 18 '18 11:12 AdamBrouwersHarries

My issue is that googlebench (imho, incorrectly - but I'll explain further) calculates that fewer iterations are required when N and M are large, as the time taken by F seems more stable.

What it does is it calculates how many iterations are needed so that the time it takes to run these iterations is statistically significant (min_time, 0.5s by default i think)

This, unfortunately, isn't always the case, as we sometimes work with quite temperamental hardware that can have quite wide performance ranges for F (even when N and M are large).

I'm not sure what temperamental hardware means here.

LebedevRI avatar Dec 18 '18 11:12 LebedevRI

For the F that I'm specifically interested in benchmarking, the "tempermental hardware" are various customer GPUs that have slightly unpredictable throttling, and warmup times. Additionally, we do a certain amount of JIT compilation that may cause unexpected changes in performance at unpredictable times. Warming up the benchmark might be a potential solution, but also has other problems (especially with long running benchmarks), so I'd much rather ask the benchmarking framework to take more measurements than it thinks it needs.

AdamBrouwersHarries avatar Dec 18 '18 13:12 AdamBrouwersHarries

  1. Have you seen State::KeepRunningBatch()?
  2. Have you seen Benchmark::MinTime()?

LebedevRI avatar Dec 18 '18 13:12 LebedevRI

As I (think) I explained above, Benchmark::MinTime doesn't suit my requirements, as it affects all configurations, not just long running ones. If I understand correctly, State::KeepRunningBatch() essentially implements the same behaviour as Iterations, which (as I said above) doesn't suit my requirements. That said, a way to "request" that the current state keeps running iterations (while in a for (auto _ : state) loop) would also be suitable, but I haven't managed to find a way to do that yet.

AdamBrouwersHarries avatar Dec 18 '18 14:12 AdamBrouwersHarries

One of the likely misunderstandings is your use of "stable". We don't do anything to determine statistical stability in tracking iteration counts. We merely run the number of iterations required for the benchmark to run for a minimum time. This is why the only things that (currently) affect it are changing the minimum time or setting the number of iterations.

If you want to increase the minimum number of iterations, increase the minimum time. It will mean the smaller runs will run for more iterations, but that would be the case if you increased minimum iterations too :)

You can see the check here https://github.com/google/benchmark/blob/master/src/benchmark_runner.cc#L266 and the iteration increase logic here https://github.com/google/benchmark/blob/master/src/benchmark_runner.cc#L243 .

An aside, if you're timing GPUs, you might already be using manual timing. If you are, then we will run until the manual time you report is greater than the minimum time.

Dominic Hamon | Google There are no bad ideas; only good ideas that go horribly wrong.

On Tue, Dec 18, 2018 at 2:05 PM Adam Harries [email protected] wrote:

As I (think) I explained above, Benchmark::MinTime doesn't suit my requirements, as it affects all configurations, not just long running ones. I've not come across State::KeepRunningBatch(), I'll take a look.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/benchmark/issues/746#issuecomment-448231578, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfIMrzdaOeYvlkPQMGjoHUBmm3kyVkoks5u6PY4gaJpZM4ZW1PW .

dmah42 avatar Dec 18 '18 14:12 dmah42

Good to know about the statistical stability of the benchmarks - and what controls the number of iterations. I wonder if an alternative solution might be to implement alternative strategies for deciding the number of iterations? e.g. keep running the benchmark until the standard deviation is 5% of the mean?

It will mean the smaller runs will run for more iterations, but that would be the case if you increased minimum iterations too :)

That wouldn't be the case - my faster configurations run for (say) 20,000 iterations, while my slower ones run for (say) 5 iterations. Let's say both take about 10 seconds total. If I could set a minimum number of iterations (say, 30), the former set would still run for 20,000 iterations, and the second set would run for 30.

If I can only set the minimum time, however, then I need to set it to the equivalent of 30 iterations for the slower configuration (60s). In this case both the faster and slower configurations will run for that long, meaning that the whole set takes 2 minutes, rather than 1m10s, and the faster benchmarks will run for 120,000 iterations which slows down the whole suite of benchmarks.

I hope this makes my use case a bit clearer?

AdamBrouwersHarries avatar Dec 18 '18 16:12 AdamBrouwersHarries

On Tue, Dec 18, 2018 at 4:34 PM Adam Harries [email protected] wrote:

Good to know about the statistical stability of the benchmarks - and what controls the number of iterations. I wonder if an alternative solution might be to implement alternative strategies for limiting the iterations? e.g. keep running the benchmark until the standard deviation is 5% of the mean?

It's possible, and I thought i'd opened an issue about this at some point in the past but i can't find it. A similar heuristic would be related to the confidence interval becoming 'stable', ie the rate of change of the confidence intervals is below some rate. They're roughly the same idea though.

The reason why I haven't dug in to work on this as it becomes harder to tune as the timing and iteration counts become even more implicit.

It will mean the smaller runs will run for more iterations, but that would be the case if you increased minimum iterations too :)

That wouldn't be the case - my faster configurations run for (say) 20,000 iterations, while my slower ones run for (say) 5 iterations. Let's say both take about 10 seconds total. If I could set a minimum number of iterations (say, 30), the former set would still run for 20,000 iterations, and the second set would run for 30.

If I can only set the minimum time, however, then I need to set it to the equivalent of 30 iterations for the slower configuration (60s). In this case both the faster and slower configurations will run for that long, meaning that the whole set takes 2 minutes, rather than 1m10s, and the faster benchmarks will run for 120,000 iterations which slows down the whole suite of benchmarks.

I hope this makes my use case a bit clearer?

Understood.

As a workaround you could split the ranges into multiple registration calls with MinTime called on each individually. Ie:

static void BM_MatrixThing(benchmark::State& state) {
  auto M = createMatrix(state.range(0), state.range(1));
  for (auto _ : state) M.process();
}

BENCHMARK(BM_MatrixThing)->Ranges({{32, 128}, {32, 128}})->MinTime(5);
BENCHMARK(BM_MatrixThing)->Ranges({{128, 512}, {128, 512}})->MinTime(10);
BENCHMARK(BM_MatrixThing)->Ranges({{512, 2018}, {512, 2048}})->MinTime(50);

does that help?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/benchmark/issues/746#issuecomment-448283808, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfIMm4Uuu9V8kNqiso3SjID8q9iynX3ks5u6RktgaJpZM4ZW1PW .

dmah42 avatar Dec 18 '18 16:12 dmah42

I wonder if an alternative solution might be to implement alternative strategies for limiting the iterations? e.g. keep running the benchmark until the standard deviation is 5% of the mean? It's possible, and I thought i'd opened an issue about this at some point in the past but i can't find it.

Separate iterations is a step in that direction :)

As a workaround you could split the ranges into multiple registration calls with MinTime called on each individually.

That is actually what i meant, i should have spelled it out explicitly :/

LebedevRI avatar Dec 18 '18 16:12 LebedevRI

On Tue, Dec 18, 2018 at 4:49 PM Roman Lebedev [email protected] wrote:

I wonder if an alternative solution might be to implement alternative strategies for limiting the iterations? e.g. keep running the benchmark until the standard deviation is 5% of the mean? It's possible, and I thought i'd opened an issue about this at some point in the past but i can't find it.

Separate iterations is a step in that direction :)

It shouldn't be necessary; there are iterative calculations for mean and standard deviation that aren't terrible to implement.

As a workaround you could split the ranges into multiple registration calls with MinTime called on each individually.

That is actually what i meant, i should have spelled it out explicitly :/

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/benchmark/issues/746#issuecomment-448289001, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfIMioTEGt_ZsHupe5T9eSvbtXOoiwTks5u6RyMgaJpZM4ZW1PW .

dmah42 avatar Dec 18 '18 16:12 dmah42

We used to have a way to control the minimum iterations directly, but it turns out that it is easy to misconfigure. We found that every use-case could be covered by minimum time or direct iteration settings and having both caused quite a bit of extra complexity in the framework.

Why not take the benchmark_min_iterations and run the benchmarks for time benchmark_min_time like it is normally done, and if and only if the number of iterations required to run for time benchmark_min_time is less than the requested benchmark_min_iterations, do an average to find the time required for each iteration, and modify the value of benchmark_min_time and let the benchmark continue iterating, to account for the rest of the requested iterations that couldn't be completed within the original benchmark_min_time?

This way benchmark_min_time is still being used to do the iterations and probably will be much simpler than trying to handle benchmark_min_iterations separately.

aerosayan avatar Aug 22 '19 16:08 aerosayan

On Thu, Aug 22, 2019 at 9:53 AM Sayan Bhattacharjee < [email protected]> wrote:

We used to have a way to control the minimum iterations directly, but it turns out that it is easy to misconfigure. We found that every use-case could be covered by minimum time or direct iteration settings and having both caused quite a bit of extra complexity in the framework.

Why not take the benchmark_min_iterations and run the benchmarks for time benchmark_min_time like it is normally done, and if and only if the number of iterations required to run for time benchmark_min_time is less than the requested benchmark_min_iterations, do an average to find the time required for each iteration, and modify the value of benchmark_min_time and let the benchmark continue iterating, to account for the rest of the requested iterations that couldn't be completed within the original benchmark_min_time?

That's the extra complexity i was mentioning above. Also, what happens if the user configures the min iterations and min time? which is more important?

This is a decision that was made with quite a bit of discussion and thought, and unless you have a clear use-case that the framework isn't currently supporting (ie, if you can't use min time to get the behaviour you need in your results) then i don't really want to reopen the discussion.

dmah42 avatar Aug 22 '19 16:08 dmah42

Additionally --benchmark_max_iterations=... is nice to have. In some use cases, each iteration will use some resources (memory) and we want to bound the total resource usage.

yixing-liu avatar May 14 '20 02:05 yixing-liu

Agree with @yixing-liu, we would like to be able to specify the maximum number of iterations on the command line. The reason is that we don't want to hard-code the number of iterations so that we can run the complete, statistically significant number of iterations by default, but in time-constrained environments (e.g. continuous benchmarking / CI), we need to be able to limit the time taken.

Alternatively, a maximum time option could be useful.

harrism avatar Jul 24 '20 02:07 harrism

If I understand correctly, State::KeepRunningBatch() essentially implements the same behaviour as Iterations, which (as I said above) doesn't suit my requirements. That said, a way to "request" that the current state keeps running iterations (while in a for (auto _ : state) loop) would also be suitable, but I haven't managed to find a way to do that yet.

KeepRunningBatch is still time adaptive, so it is not the same as setting a fixed Iterations on the benchmark. All it does is ensure that the number of iterations run are a multiple of the batch size.

Consider using it this way:

constexpr int batch_size = 2000;
while (state.KeepRunningBatch(batch_size)) {
  for (int i = 0; i < min_iterations; ++i) {
    // code being benchmarked
  }
}

The only semantic difference between this and a "minimum iterations" option is that the benchmark loop will always do a positive integral multiple of batch_size iterations, whereas a hypothetical "minimum iterations" option will do at least that many but possibly some arbitrary positive rational multiple more. I find the difference rarely matters in practice.

The other use case for KeepRunningBatch is when the set up cost is expensive relative to the iteration cost. If I know that a realistic result requires at least 10K iterations I can use that as a batch size and save the framework the bother of trying with 1, 10, 100 and 1000 iterations first. In these situations running the benchmark with --v=3 is a good way to see the behavior.

matta avatar Sep 13 '22 16:09 matta

Agree with @yixing-liu, we would like to be able to specify the maximum number of iterations on the command line. The reason is that we don't want to hard-code the number of iterations so that we can run the complete, statistically significant number of iterations by default, but in time-constrained environments (e.g. continuous benchmarking / CI), we need to be able to limit the time taken.

Alternatively, a maximum time option could be useful.

@harrism would --benchmark_min_time work for the time constrained environments you speak of? I find that --benchmark_min_time=0.01 conveniently shortens the benchmark times to "very short" in practice, and I do that for testing (e.g. running them under address sanitizer).

matta avatar Sep 13 '22 16:09 matta

If it limits the time, why is it called "min" time, and not "max" time?

harrism avatar Sep 13 '22 21:09 harrism

https://github.com/google/benchmark/blob/a476d0fd8e5d71a3b865d1ae9e0bfd7b4b2d5aa3/src/benchmark.cc#L68-L73 however note that it does not constrain the absolute total time that may be spent during benchmarking that particular benchmark instance, which i gather is the actual request?

LebedevRI avatar Sep 13 '22 21:09 LebedevRI

Hi, can anybaby help me slove this question? https://github.com/gabime/spdlog/issues/2968 why the lines do not equal the Iterations?

chaotianjiao avatar Jan 03 '24 07:01 chaotianjiao

Google benchmark first calls the benchmark function with 1 iteration. If that call doesn't take enough time, it calls it with 10 iterations. If that call doesn't take enough time, it call it with 100 iterations, and then 1000, and then 10000, etc. There are multiple command line options that control this "warm up" behavior. See https://github.com/google/benchmark/blob/main/docs/user_guide.md#runtime-and-reporting-considerations

When reporting results it only counts the last call to the benchmark function. This is why any logging you have in the benchmark function will appear to have executed more iterations than the line displayed in the benchmark results.

You might be able to run your benchmark with the --v flag set to something like --v=2 or --v=3 to understand this better. With a high enough verbose level it should print every call google benchmark makes to a benchmark function.

matta avatar Jan 03 '24 21:01 matta

Google benchmark first calls the benchmark function with 1 iteration. If that call doesn't take enough time, it calls it with 10 iterations. If that call doesn't take enough time, it call it with 100 iterations, and then 1000, and then 10000, etc. There are multiple command line options that control this "warm up" behavior. See https://github.com/google/benchmark/blob/main/docs/user_guide.md#runtime-and-reporting-considerations

When reporting results it only counts the last call to the benchmark function. This is why any logging you have in the benchmark function will appear to have executed more iterations than the line displayed in the benchmark results.

You might be able to run your benchmark with the --v flag set to something like --v=2 or --v=3 to understand this better. With a high enough verbose level it should print every call google benchmark makes to a benchmark function.

Thank you in advance for your time. I'll try benchmark with the --v flag set to something like --v=2 or --v=3.

chaotianjiao avatar Jan 04 '24 03:01 chaotianjiao