benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

[FR] `PredictNumItersNeeded()` 1.4 correction factor

Open ruben-laso opened this issue 5 months ago • 23 comments

Problem description In the function PredictNumItersNeeded() there is this 1.4 correction factor. This causes the time running the experiment to exceed by ~40% the time specified by --benchmark_min_time. Of course, --benchmark_min_time denotes the minimum amount of time to run the benchmark, but an overrun of 40% seems excessive. This is particularly relevant in supercomputers, where CPU time is expensive.

  • Why is this estimation done, instead of stopping the iterations when the accumulated "iteration time" exceeds the target time?
  • Is there a reason for selecting 1.4 as a correction factor?
  • In cases where the execution times are not stable, could this prediction be wrong by a large margin?

Suggested solution I suggest either removing the correction factor or making it configurable (with a default value of 1.0).

Example As shown in the following output (executed with --benchmark_min_time=1s) the real execution time is ~1.4s: $7039 \times 198481 = 1397107759$, $8775 \times 160984 = 1412634600$, ...

------------------------------------------------------------------------------------------------------------------------
Benchmark                                                              Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------
GNU-TBB/std::adjacent_difference/double/1024/manual_time            7039 ns         7022 ns       198481 bytes_per_second=6.50968Gi/s
GNU-TBB/std::adjacent_find/double/1024/manual_time                  8775 ns         8702 ns       160984 bytes_per_second=2.61075Gi/s
GNU-TBB/std::all_of/double/1024/manual_time                         7704 ns         7529 ns       199585 bytes_per_second=2.97387Gi/s
GNU-TBB/std::any_of/double/1024/manual_time                         4707 ns         4625 ns       301878 bytes_per_second=4.86754Gi/s

When executing the same code with --benchmark_min_time=0.71s ($1/1.4 \simeq 0.71$), the execution times are much closer to 1s: $7681 \times 134530 = 1033324930$, $9265 \times 103410 = 958093650$, ...

------------------------------------------------------------------------------------------------------------------------
Benchmark                                                              Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------
GNU-TBB/std::adjacent_difference/double/1024/manual_time            7681 ns         7615 ns       134530 bytes_per_second=5.96565Gi/s
GNU-TBB/std::adjacent_find/double/1024/manual_time                  9265 ns         9175 ns       103410 bytes_per_second=2.47288Gi/s
GNU-TBB/std::all_of/double/1024/manual_time                         7996 ns         7572 ns       110333 bytes_per_second=2.86519Gi/s
GNU-TBB/std::any_of/double/1024/manual_time                         4656 ns         4680 ns       192865 bytes_per_second=4.92025Gi/s

ruben-laso avatar Sep 06 '24 11:09 ruben-laso