fix: limit the rate of interruption checks
This is a suggestion for improving on (if not fixing) #940.
It uses a combination of limiting the number of times the function runs (currently every 4th call) and limiting the number of checks to a millisecond rate. Both are needed, as retrieving the time can be expensive.
I tested this quite a bit on macOS, and I think the current parameters are fine. The timing check is not cheap (even though using the fastest available timing method I could find), but limiting it to every 4th call helps.
I included code for Linux and FreeBSD but I could not test either. Testing is important: we need to make sure that the timing calls are not too expensive. Can you help with this for Linux @Antonov548 @krlmlr ? Here's what needs to be done:
- Check performance with interruption checks disabled, using command line R. Just put a
return IGRAPH_SUCCESSat the beginning of the interrupt handler. - Check performance without the changes in this PR, using command line R.
- Check performance with this PR, and see where performance is relative to the previous two benchmarks. It may be necessary to adjust the number of skips (currently 4) by platform. Ideally this would be 1, but if timing checks are too slow, we need to put higher numbers, like I did on macOS. I would not go higher than 10-20 under any circumstances.
Note that CLOCK_MONOTONIC_COARSE is platform-specific, and the Linux version is unrelated to the FreeBSD one. They just happen to have the same name.
If we go ahead with this approach, we need to add a Windows version as well (perhaps @Antonov548 can try?)
With this PR, on macOS, I get:
RStudio:
> g<-make_ring(100000000)
> system.time(is_connected(g))
user system elapsed
1.566 0.158 1.724
Command line:
user system elapsed
1.569 0.033 1.602
R GUI:
user system elapsed
1.574 0.102 1.674
This is basically the same performance as with interruption checks disabled.
Without this PR, I get:
RStudio:
user system elapsed
24.681 0.355 25.030
Command line:
user system elapsed
2.970 0.048 3.018
R GUI:
user system elapsed
3.741 0.158 3.899
Using this PR, but not limiting timing checks to every 4th call gives me this timing:
user system elapsed
1.939 0.025 1.963
This shows that in general timing checks are non-negligible.
One problem with this timing approach is that it assumes that the interruption handler would be called at least once per second, but that may not always be the case.
If we do go with this approach (which is something to be decided—I'm not 100% comfortable with this), the best function to use on Windows is likely QueryPerformanceCounter
I wonder why RStudio is so slow compared to the CLI or the R GUI. Maybe there's some inter-thread synchronization involved? (R code is probably running on its own thread but it needs to call back to the GUI thread to check whether there was an interruption, and that call may be blocking until the next iteration of the event loop of the GUI thread).
Anyhow, I'm okay with this PR and I don't think we could do any better than this given the constraints involved.
I'm inclined to propose throttling interruption checks to 10 msec or even slower - no one would notice the difference, but for instance the resolution of the coarse monotonic clock on Linux (Ubuntu 20.04, kernel version 5.4) is 4 msec according to a response on SO, and if the kernel happens to round upwards to the nearest 4 msec, then the rate limiting won't have any effect at all. With 10 msec we are more likely to land in a regime that is coarser than the granularity of the coarse monotonic clock.
Also, maybe we should move this to the C core so that all interfaces benefit from it?
and if the kernel happens to round upwards to the nearest 4 msec, then the rate limiting won't have any effect at all
I don't think that's true (I actually saw that response on SO). The code measures from the last interruption check until now, not from the last call to the interruption handler to now. Right now the rate is set to 1 ms. That means that the check will happen at most once per millisecond, but if the clock granularity is 4 ms, then it will happen only every 4 ms and that's perfectly fine. My point is that we don't need to set a rate that is lower than the clock resolution. Nor can we, since different platforms might have very different resolutions.
We can't possibly aim to check resolutions, and adapt. The code has to be such that it works regardless resolutions. Also, good to be aware: what clock_getres() returns is not the rate at which the timer updates. I checked this on macOS, and CLOCK_MONOTONIC_RAW_APPROX has a "resolution" of 42 ns according to getres(), but it updates much less frequently (presumably on each context switch).
What I am concerned about is:
- If the interruption handler gets called less often than once per second (which can happen, e.g. with GLPK) then this code won't work properly anymore.
- All this timing voodoo is extremely platform-specific and even hardware-specific (I know the behaviour is different on Apple Silicon vs Intel on macOS).
- Calling timing functions an already be slow, and I worry that on some platforms it can be very slow. Even with the fastest option available on macOS, it has a noticeable performance impact. This is why I had to limit the rate of checks both by counting calls and timing. Calling the timer each time was too slow.
Also, maybe we should move this to the C core so that all interfaces benefit from it?
I really don't want to do that. The interruption handler in Python and Mathematica is faster than the call to the fastest timer on macOS! This code would hurt those interfacers, not benefit them. Plus consider how non-portable all this is, and how little we are able to test such timing issues on exotic hardware. The C core is very portable, and I don't want to sacrifice that.
So, all in all, I am not quite happy with this, but it's the best I could come up with given R's very slow interruption mechanism.
If we want to take this upstream to fix RStudio, what's a good way to replicate it?
I still propose thinning out calls to the interruption handler on a case-by-case basis. Nothing's faster than a function not being called at all. Can we somehow do this as well for the problem here? The "thinning out" by a factor of four, part of this PR, is it good for all cases? We started with 10 IIRC.
Alternatively: Could we somehow make the "thinning out" adaptive? Start with checking every x iterations, check timing, and adapt to check less (or more) frequently?
Current Aviator status
Aviator will automatically update this comment as the status of the PR changes. Comment
/aviator refreshto force Aviator to re-examine your PR (or learn about other/aviatorcommands).
This pull request is currently open (not queued).
How to merge
To merge this PR, comment /aviator merge or add the mergequeue label.
The best solution, with zero cost (in the ideal case) for the interrupt checks, would be to run the igraph computation in a separate thread. The main thread would only be responsible for interruption checks and for memory allocation, and sleep the rest of the time.
Let's not do timers but keep this open for now as a reference. I might be able to raise this with Posit based on the original example in https://github.com/igraph/rigraph/pull/942/#issuecomment-1793583045.
Picking this up.
In duckdb, interruption now works by temporarily installing a signal handler. Is this something that would work for igraph? Are we signal-safe?
Do you mean things like SIGABRT? Is that even available on Windows?
I am not very familiar with this topic. Can you give me references to help me research/understand what "signal-safe" means?
Any input here @ntamas ?
With signal-safe, I mean that the library is left in a state that is suitable for further invocations. It works on Windows too.
In duckdb, interruption now works by temporarily installing a signal handler.
Host languages usually have their own signal handlers for SIGINT - for instance, in Python, SIGINT generates a KeyboardInterrupt. And it causes countless problems in threaded contexts where it cannot be predicted which thread the SIGINT will be delivered to. Anyhow, my assumption is that signal handling is the responsibility of the host language and igraph as a library should simply ask the host language whether an interruption request has been delivered. Messing with SIGINt behind the back of the host language is probably just calling for trouble.
How would this work in R's case? Whenever we enter an igraph function from R, we overwrite R's signal handler with ours, and then swap it back whenever we return to R (either from the function itself or because we are calling a callback that is implemented in R)?
I agree that messing with the signal handler is a special kind of fun. It works for duckdb: we install our own SIGINT handler, this handler calls a function in duckdb that tells it to abort. I now see how my question about signal safety is ill-phrased -- we don't need it, we're catching SIGINT and just setting a flag.
Not sure we need to reset the signal handler for callbacks. We'd just wait for the callback to finish and catch the interrupt back in igraph land.
We could add this to the IGRAPH_R_CHECK macro.
The upside is also that no active interrupt check is needed.