Running several counters concurrently
perf can only monitor a specific OS process running on specific (or all) CPU core. It's unaware of Haskell's RTS and OS threads.
I expect that running several counters concurrently may give strange confusing results. Running a test with counter at the same time with other (non cpu-instruction-counter) tests will also be confusing.
Now, some test runners (e.g. tasty) do parallel test execution by default. This may be a great source of confusion for an unaware user.
I have several ideas of ranging complexity that can help here, but ultimately we have to play around and investigate this.
- Add a visible notice to README telling users not to run counters concurrently
- Make a global lock that is taken by
startInstructionCounter- if the lock is taken, next
startInstructionCountercan fail with meaningful error message; - or it could just wait until the lock is released, which will sequentialize
cpu-countertests - BUT: all of this seems hacky and won't help if you concurrently run non cpu-instruction-counter tests
- if the lock is taken, next
- We can investigate how to actually make it work concurrently:
startInstructionCountercan return aHandlethat will allow to work with this specific counter, tracking information related to it- we can use
forkOnto run on specific capability which usually corresponds to a core. It's implementation dependent, but we only work on Linux so it's probably fine- but probably there's a more reliable way to fork onto specific core, I don't know
- I scanned through a manpage and noticed interesting variables like
PERF_SAMPLE_ID,PERF_FORMAT_ID,PERF_SAMPLE_GROUP,PERF_SAMPLE_ID. I din't look any closer yet, but maybe this can be used for reliably tracking several counters. This stackoverflow question may be related, but I didn't read closely.
I can only be sure about the first option (warn users in the README). In any case, cpu-instruction-counter is a thing that works only on Linux and uses FFI, so the best practice should be that all instruction counting tests/benchmarks live in separate executable, that's compiled with +RTS -N1 which eliminates the problem.
@zudov Great points, thanks!
Add a visible notice to README telling users not to run counters concurrently
Good idea, I just did it with https://github.com/nh2/haskell-cpu-instruction-counter/commit/077539f25684ff9bf583204ccae5a1d77d617d1b.
if the lock is taken, next
startInstructionCountercan fail with meaningful error message
That sounds like a good idea until we have cleared up how exactly parallel usage behaves.
We probably want to do that locking against what's returned by perfEventOpenHwInstructions though. It is the one that chooses (in its C implementation) to record events for all threads. It would be legitimate to obtain an event FD that doesn't do that (e.g. one that only listens to events on a particular thread), and then call startInstructionCounter in parallel on two such event counters.
So I think in general best is to expose both an API that allows you to do everything conveniently from Haskell, and one that's safe to use against common errors (such as accidentally doing parallel perf invocations).
startInstructionCountercan return a Handle that will allow to work with this specific counter, tracking information related to it
That one I don't quite understand. The perfEventOpenHwInstructions is what returns such a handle.
we can use
forkOnto run on specific capability which usually corresponds to a core. It's implementation dependent, but we only work on Linux so it's probably fine
This may not be sufficient in general. What happens if the forkOned f calls forkOn itself with another CPU (or just forkIO)?
the best practice should be that all instruction counting tests/benchmarks live in separate executable, that's compiled with +RTS -N1 which eliminates the problem.
That's not accurate:
Even with -N1 you may have 30 threads running. In -threaded each safe FFI call spawns a new pthread, no matter what you give for -N (see docs).
Only the non-threaded RTS provides the guarantee you're speaking of.
There is another related topic that needs to be cleared up: https://github.com/nh2/haskell-cpu-instruction-counter/issues/7
In weigh I handle this by launching the process n times per test, because I want a fresh cold process for each run. A suite-like interface to this library could do the same thing.