benchmark
benchmark copied to clipboard
Introduce profiler interface to benchmark framework.
- The profiler is optional. To use it, users of benchmark library provides implementations (e.g., defines how counters are collected and reported) and registers profiler once.
- The profiler is initialized once per
RunSpecifiedBenchmarks
and injected in a way to skip capturing states forBenchmark::Setup
orBenchmark::TearDown
. The profiler callsFinalize
to do post-processing.
This seems to miss some motivational words. Why is that something we want? What problem does it solve?
Hi Roman, The motivating use case is allow benchmark users to inject custom implementations of profilers to measure as accurately as possible (e.g., no need to measure SetUp or TearDown cost or any test harness data), and avoiding polluting the profiles.
Then this doesn't do the right thing still - how does this deal with PauseTiming()
?
I'm not sure we want to add all this complexity unless it's a direct generalization of the existing timers.
Then this doesn't do the right thing still - how does this deal with
PauseTiming()
?
{Pause,Resume}Timing and profilers are added to solve relatively orthogonal problems.
- {Pause,Resume}Timing are exposed. Individual benchmarks could call them at reasonable places to control the interval time or perf-counters are measured and get accurate time/cycle numbers.
- Profilers are injected to skip collect profiles for test harness (including loops, etc) and possibly costly
SetUp
orTearDown
.SetUp
andTearDown
are called per repetition so they could be non-trivial if benchmarks run multiple repetitions). This could improve profile representativeness and more importantly avoid misleading profiles.- To elaborate, depending on the profiles collected, counting instructions between PauseTiming and ResumeTiming could be fine. But profiles from test harness or test set-up or tear-down may pollute profiles if users are interested in what's being benchmarked.
- For example, existing memory manager is invoked once in each call to DoOneRepetition (i.e., it won't stop measurement if timer is paused). This is simple and not imprecise if we want to measure peak heap usage.
- Similarly, for a lock-contention profiler, {Pause,Resume}Timing likely won't contribute outstanding contention data.
- Another example where current infection helps. PGO profiles from heavyweight SetUp or TearDown (e.g. bulky data deserialization for reader/writer benchmarking) could misguide the optimization for the function being benchmarked.
- To elaborate, depending on the profiles collected, counting instructions between PauseTiming and ResumeTiming could be fine. But profiles from test harness or test set-up or tear-down may pollute profiles if users are interested in what's being benchmarked.
Another reason to de-couple timer and profiler is generability. Time and perf counters are measured per-thread; while profilers may need to have view across threads (e.g., contention profilers, or if the profiler rely on global states) -> if each thread invokes profiler in a synchronized way, synchronization overhead could change the performance data themselves.
I'm not sure we want to add all this complexity unless it's a direct generalization of the existing timers.
Regarding complexity, the interface and its injection is not adding much maintenance overhead (e.g., doesn't mess up the codebase or make future contributions hard) or benchmark execution overhead (optional and off if users don't need a profiler)
My point is that it may be fine for your use-case to ignore PauseTiming()
,
but it clearly won't be true for everyone's use-case.
My point is that it may be fine for your use-case to ignore
PauseTiming()
, but it clearly won't be true for everyone's use-case.
The current way of injection (based on MemoryManager but not exactly the same) aims to support
- skip some testing harness (before Init or after Finalize)
- skips test Setup and TearDown (which is invoked per benchmark repetition)
This reply above means to elaborate why skipping Setup or TearnDown could improve profile quality for profile collection purpose, and (stepping back on how to handle PauseTiming) the implications on generality (State and timers are per-thread, while a profiler might need to update data across concurrent threads)
Given that calling {Pause,Resume}Timing per iteration should generally be avoided (consider it not common), I wonder if it makes sense to prioritize the use case for this PR. If it helps avoid confusion I could add comments in class description (and PR description) to scope the intended use case.
If there is a profiler use case that needs to handle per-thread timers in the future, this interface class won't make contributions harder by then or mess up the code base in terms of maintenance -> support could be added for per-thread purposes (likely new classes, not extending this one).
i don't understand the use-case.
before working on such a large and invasive change it's generally a good idea to open an issue (feature request) with some examples of what can't be done today but would be possible with this change.
i'm pretty sure if there's a consistent issue with timing expensive setup/teardown this is going to be an issue for all users and something we should resolve within the library without some external hooks.