Use hardware performance counters instead of cachegrind
Iai is very exciting! I love the idea of benchmarks that are fast and deterministic. But relying on Cachegrind has some drawbacks:
- Limited OS support
- Requires the user to install valgrind
- Executing binaries is slow
- Valgrind alters the program's normal execution. This reduces its accuracy, and leads to bugs like #8
Modern CPUs contain hardware performance counters that can be used for nearly zero-cost profiling. Using those instead of Iai would have several benefits:
- No dependency on Valgrind
- Much faster to execute
- The counters can be paused and restarted mid-process. This would allow Iai to skip setup and teardown sections as requested in #7 .
- Wider OS support
- More accurate and detailed reports.
On FreeBSD, pmc(3) provides access to the counters, and there is already a nascent Rust crate for them: pmc-rs. On Linux, I think the perfcnt and perf crates provide the same functionality.
I think that https://github.com/jbreitbart/criterion-perf-events is an attempt to do that.
cool! Thanks for the tip.
Yes, if that's what you want I would recommend using the criterion-perf-events plugin.
Cachegrind is used specifically for its emulation of the memory hierarchy. Because we can control the parameters of that emulation, Iai can take measurements under cachegrind that should be far more repeatable and consistent between machines than are possible even with performance counters. Hardware performance counters will naturally be different between different hardware.
In addition, under virtualization it's common for access to the performance counters of the underlying hardware to be disabled, so it's not as if that approach is without drawback either. I know this is the case, because the VM I do my work in at my day job has its performance counters disabled for mysterious IT-department reasons.