benchmark [FR] Report max, min, avg time per iteration

Is your feature request related to a problem? Please describe. Currently there is no way to report max, min time per iteration. Once the benchmark is run you will get these metrics: "Time" , "CPU" , "Iterations".

In many scenarios you would love to know min and max beyond the average time to spot the outliers. Imaging that you decode video frames. You might want to know if there are video frames which took exceptionally longer than the other frames in your video.

While it's possible to set the number of repetitions > 1 and implement custom statistics to obtain min and max, there is no similar logic per iterations basis.

Describe the solution you'd like Add some flag to report the metrics in this way: "Average iteration time", "Minimum iteration time", "Maximum iteration time", "average iteration CPU" , "Iterations".

Describe alternatives you've considered I've tried to repetitions > 1 and implemented custom statistics. However, it did not solve my problem. I would love to have max and min per iteration and not per repetition. I also tried the combination of manual timing and counters to imitate max and min, however that solution is not elegant.

Additional context At the end, I ended up using a DYI benchmark and would love to add min/max per iteration to google/benchmark.

Sep 06 '23 13:09 ivantishchenko

Been there, done that.

The general problem is that we run a bunch of iterations together, and measure how much time they took, we do not measure each iteration separately.

The "workaround" is to force iteration count to be 1, and force repetition count to the number of iterations it would have taken otherwise.

I still do believe this should be more nicely supported by the library, and tried implementing in #709/#710, i don't remember why it didn't proceed (@dmah42 ?).

Sep 06 '23 13:09 LebedevRI

@LebedevRI @dmah42

What I was trying is Iterations(num_frames) and Repetitions(1).

I want something like this:

VideoObject v;

for (_ : state) 
{
     v.decodeFrame();
}

so this will be executed num_frames times and there will be min, max out of num_frames. With Iterations(1) and Repetitions(num_frames) it would not be possible to time what I would love to time. So we really need per iteration metrics for this usecase.

Sep 06 '23 13:09 ivantishchenko

per iteration metrics just isn't feasible due to the extremely large number of iterations we'd need to hold in memory over a run.

Sep 06 '23 16:09 dmah42

Been there, done that.

The general problem is that we run a bunch of iterations together, and measure how much time they took, we do not measure each iteration separately.

The "workaround" is to force iteration count to be 1, and force repetition count to the number of iterations it would have taken otherwise.

I still do believe this should be more nicely supported by the library, and tried implementing in #709/#710, i don't remember why it didn't proceed (@dmah42 ?).

i don't remember.. my last comment is "all good i think" and then you closed it?

want to reopen it and try to rebase? :P

Sep 06 '23 16:09 dmah42

@dmah42

per iteration metrics just isn't feasible due to the extremely large number of iterations we'd need to hold in memory over a run.

I am not sure I understand everything correctly. Please correct me if I am wrong:

Why do we need to hold all iteration timestamps in memory, if we want to compute max, min, avg between all iterations? Could we just get the timestamp after each iteration and see if it's the maximum one?

From thread_time.h. The sum of timestamps is computed

  // Called by each thread
  void StopTimer() {
    BM_CHECK(running_);
    running_ = false;
    real_time_used_ += ChronoClockNow() - start_real_time_;
    // Floating point error can result in the subtraction producing a negative
    // time. Guard against that.
    cpu_time_used_ +=
        std::max<double>(ReadCpuTimerOfChoice() - start_cpu_time_, 0);
  }

and then in reporter.cc the average is computer over all iterations

double BenchmarkReporter::Run::GetAdjustedRealTime() const {
  double new_time = real_accumulated_time * GetTimeUnitMultiplier(time_unit);
  if (iterations != 0) new_time /= static_cast<double>(iterations);
  return new_time;
}

May I add max and min to the thread_time.h in addition to sums?

  // Accumulated time so far (does not contain current slice if running_)
  double real_time_used_ = 0;
  double cpu_time_used_ = 0;

Sep 07 '23 09:09 ivantishchenko

right, if we want to selectively pick max, min, etc (and average we already report) then we can do that on the fly as you suggest. but if you wanted to do something that also allows for 10th %ile or quartiles, then we'd need to get all iteration results and that's where we need to be cleverer. that's why i said "per iteration metrics" isn't feasible.

tracking min and max certainly is.

Sep 07 '23 12:09 dmah42

I'm of two minds on this.

I don't see why we'd need to do anything more than just store a vector of measurements instead of an average value of that vector, so that's 4(8?) bytes per iteration. But of course that won't work well with large iteration counts. But since it won't be the default, but an explicit opt-in, i don't really see any harm in that?

Sep 07 '23 12:09 LebedevRI

how would a user know to opt-in? if we only want to add min/max then that's trivial to add... or we could store results in a fixed size array until it overflows and then fall back to not having the per iteration counts available... define the capacity to 1024 or something.

Sep 07 '23 15:09 dmah42

The same way they do for any other feature, and the same way they would for the "iteration-as-repetition" thing.

Sep 07 '23 15:09 LebedevRI

i mean, how would they know when it's safe to, when it's appropriate to, and why it might fail if they try and they were wrong?

we can track min/max per iteration with little overhead if that's all we need, without having to invent something for per-iteration metrics more broadly.

Sep 08 '23 08:09 dmah42

we can track min/max per iteration with little overhead if that's all we need, without having to invent something for per-iteration metrics more broadly.

It's really enough for the usecase I have. I think it's a good start. May I tackle this in a separate pull request first?

i mean, how would they know when it's safe to, when it's appropriate to, and why it might fail if they try and they were wrong?

I would tackle it separately, once the previous one is ready. We could define the capacity we know we can handle for sure if the number of iterations is larger than this capacity just warn the user? For example in Grafana there is a warning "too many data points".

Sep 08 '23 08:09 ivantishchenko

i mean, how would they know when it's safe to, when it's appropriate to, and why it might fail if they try and they were wrong?

Trial&error + documentation, i suppose. It's not like this is a extremely user-facing library :) Storing up to 2MB of measurements seems uncontentious to me, but larger iteration counts are questionable, yes.

we can track min/max per iteration with little overhead

Err, hold on, but can we? I am under a very strong impression that the current design is very intentional, and the new behavior would need to be opt-in.

if that's all we need, without having to invent something for per-iteration metrics more broadly.

The thing is, iteration-as-repetition is a workaround for the lack of separate-iterations feature, and now that there's warmup feature (and shuffling...), i'm not really sure how they all should best interplay.

Additionally, iteration-as-repetition would result in strictly greater memory pressure, because then we will be storing much more than just a single float/double measurement...

we can track min/max per iteration with little overhead if that's all we need, without having to invent something for per-iteration metrics more broadly.

It's really enough for the usecase I have. I think it's a good start. May I tackle this in a separate pull request first?

I'm mainly worried about the fact that once we add something, we're kind-of stuck with it.

Sep 08 '23 16:09 LebedevRI

i mean, how would they know when it's safe to, when it's appropriate to, and why it might fail if they try and they were wrong?

Trial&error + documentation, i suppose. It's not like this is a extremely user-facing library :) Storing up to 2MB of measurements seems uncontentious to me, but larger iteration counts are questionable, yes.

fair

we can track min/max per iteration with little overhead

Err, hold on, but can we? I am under a very strong impression that the current design is very intentional, and the new behavior would need to be opt-in.

ugh no, sorry, i think i confused myself. StopTimer isn't per iteration, it's per repetition.

if that's all we need, without having to invent something for per-iteration metrics more broadly.

The thing is, iteration-as-repetition is a workaround for the lack of separate-iterations feature, and now that there's warmup feature (and shuffling...), i'm not really sure how they all should best interplay.

Additionally, iteration-as-repetition would result in strictly greater memory pressure, because then we will be storing much more than just a single float/double measurement...

we can track min/max per iteration with little overhead if that's all we need, without having to invent something for per-iteration metrics more broadly.

It's really enough for the usecase I have. I think it's a good start. May I tackle this in a separate pull request first?

I'm mainly worried about the fact that once we add something, we're kind-of stuck with it.

Sep 11 '23 08:09 dmah42

benchmark benchmark copied to clipboard

[FR] Report max, min, avg time per iteration

benchmark
benchmark copied to clipboard