benchmark Introduce profiler interface to benchmark framework.

The profiler is optional. To use it, users of benchmark library provides implementations (e.g., defines how counters are collected and reported) and registers profiler once.
The profiler is initialized once per RunSpecifiedBenchmarks and injected in a way to skip capturing states for Benchmark::Setup or Benchmark::TearDown. The profiler calls Finalize to do post-processing.

Jan 18 '23 21:01 minglotus-6

This seems to miss some motivational words. Why is that something we want? What problem does it solve?

Jan 18 '23 22:01 LebedevRI

Hi Roman, The motivating use case is allow benchmark users to inject custom implementations of profilers to measure as accurately as possible (e.g., no need to measure SetUp or TearDown cost or any test harness data), and avoiding polluting the profiles.

Jan 18 '23 22:01 minglotus-6

Then this doesn't do the right thing still - how does this deal with PauseTiming()? I'm not sure we want to add all this complexity unless it's a direct generalization of the existing timers.

Jan 18 '23 22:01 LebedevRI

Then this doesn't do the right thing still - how does this deal with PauseTiming()?

{Pause,Resume}Timing and profilers are added to solve relatively orthogonal problems.

{Pause,Resume}Timing are exposed. Individual benchmarks could call them at reasonable places to control the interval time or perf-counters are measured and get accurate time/cycle numbers.
Profilers are injected to skip collect profiles for test harness (including loops, etc) and possibly costly SetUp or TearDown. SetUp and TearDown are called per repetition so they could be non-trivial if benchmarks run multiple repetitions). This could improve profile representativeness and more importantly avoid misleading profiles.
- To elaborate, depending on the profiles collected, counting instructions between PauseTiming and ResumeTiming could be fine. But profiles from test harness or test set-up or tear-down may pollute profiles if users are interested in what's being benchmarked.
  1. For example, existing memory manager is invoked once in each call to DoOneRepetition (i.e., it won't stop measurement if timer is paused). This is simple and not imprecise if we want to measure peak heap usage.
  2. Similarly, for a lock-contention profiler, {Pause,Resume}Timing likely won't contribute outstanding contention data.
  3. Another example where current infection helps. PGO profiles from heavyweight SetUp or TearDown (e.g. bulky data deserialization for reader/writer benchmarking) could misguide the optimization for the function being benchmarked.

Another reason to de-couple timer and profiler is generability. Time and perf counters are measured per-thread; while profilers may need to have view across threads (e.g., contention profilers, or if the profiler rely on global states) -> if each thread invokes profiler in a synchronized way, synchronization overhead could change the performance data themselves.

I'm not sure we want to add all this complexity unless it's a direct generalization of the existing timers.

Regarding complexity, the interface and its injection is not adding much maintenance overhead (e.g., doesn't mess up the codebase or make future contributions hard) or benchmark execution overhead (optional and off if users don't need a profiler)

Jan 18 '23 23:01 minglotus-6

My point is that it may be fine for your use-case to ignore PauseTiming(), but it clearly won't be true for everyone's use-case.

Jan 18 '23 23:01 LebedevRI

My point is that it may be fine for your use-case to ignore PauseTiming(), but it clearly won't be true for everyone's use-case.

The current way of injection (based on MemoryManager but not exactly the same) aims to support

skip some testing harness (before Init or after Finalize)
skips test Setup and TearDown (which is invoked per benchmark repetition)

This reply above means to elaborate why skipping Setup or TearnDown could improve profile quality for profile collection purpose, and (stepping back on how to handle PauseTiming) the implications on generality (State and timers are per-thread, while a profiler might need to update data across concurrent threads)

Given that calling {Pause,Resume}Timing per iteration should generally be avoided (consider it not common), I wonder if it makes sense to prioritize the use case for this PR. If it helps avoid confusion I could add comments in class description (and PR description) to scope the intended use case.

If there is a profiler use case that needs to handle per-thread timers in the future, this interface class won't make contributions harder by then or mess up the code base in terms of maintenance -> support could be added for per-thread purposes (likely new classes, not extending this one).

Jan 19 '23 02:01 minglotus-6

i don't understand the use-case.

before working on such a large and invasive change it's generally a good idea to open an issue (feature request) with some examples of what can't be done today but would be possible with this change.

i'm pretty sure if there's a consistent issue with timing expensive setup/teardown this is going to be an issue for all users and something we should resolve within the library without some external hooks.

Jan 19 '23 10:01 dmah42

benchmark benchmark copied to clipboard

Introduce profiler interface to benchmark framework.

benchmark
benchmark copied to clipboard