rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

Interruption checks are very expensive, especially with RStudio

Open szhorvat opened this issue 2 years ago • 9 comments

Describe the bug

Interruption checks are very expensive, especially with RStudio.

Note that "interruption checks" don't just check for interruption. They also hand back control to the GUI event loop. E.g. on Windows the R GUI would lock up during computations without interruption checks.

To reproduce

g <- make_ring(100000000)
system.time(is_connected(g))

Do not use a timing mechanism that runs the command multiple times, as is_connected is cached on the 0.10 branch.

With interruption checks disabled completely, this runs in 1.419 seconds.

Otherwise I get 2.982 s on the command line and 24.634 s in RStudio. With the R GUI, I get 3.649 s. All of these are on macOS / arm64.

These measurements are with the 0.10 branch, but main is similar.

Version information

Issue present both on the main and igraph-0.10 branches.

szhorvat avatar Nov 02 '23 09:11 szhorvat

@krlmlr Is this something we should talk to RStudio about, in addition to trying to mitigate it on our side?

Possible mitigations:

  • Look into whether the interrupt handler can be made faster while still preventing it from longjumping. I expect this is not possible.

  • Talk to RStudio and figure out what they do to make R_CheckUserInterrupt() so slow. Likely other packages are affected too.

  • Change the interrupt handler to only check for interruption on every nth call, as below:

    Add this to the beginning of R_igraph_interrupt_handler():

      static int iter = 0;
      if (++iter < 10) continue;
      iter = 0;
    

    The issue is that there are very slow functions that only call the interrupt handler rarely, notably GLPK.

  • Change the interrupt handler to only check for interruption once at least time Δt has elapsed since the last check. It requires using fast platform-specific timing functions.

Overall we are limited by R's poor design. They do not provide a fast way to check for interruption without risking a longjump. Could we take this up with R core?

szhorvat avatar Nov 02 '23 11:11 szhorvat

I wonder if this can be solved in the C core instead, by calling the interrupt handler less frequently for make_ring() and other affected functions. I suspect this will affect other front-ends too. If this fix works, I'm fine with it. We might be able to save a cycle or two by counting downward from 10 to zero.

krlmlr avatar Nov 02 '23 15:11 krlmlr

Raising this upstream feels like an uphill battle. Might be worthwhile if it affected multiple other packages.

krlmlr avatar Nov 02 '23 15:11 krlmlr

I wonder if this can be solved in the C core instead

That's not realistic. Every function would need to be considered separately, and updated in such a way that checks are frequent enough yet not too frequent. When there was a doubt about the frequency of particular interruption checks in the past, I did do benchmarking to tune it properly, and there were no performance issues with Mathematica's and Python's interrupt checkers.

It's possible that the issue with R is coming from two sources:

  • We had to change the interruption check to prevent it from longjumping. They may have worsened performance. It's a limitation of R that it simply does not provide convenient ways to check for interruption. IMO it wouldn't be a bad idea to ask them to provide proper API for this. The worst that could happen is that they say no or ignore the request.
  • Something is going on with RStudio that makes interruption checks many times slower (must be at least an order of magnitude).

I just looked specifically at the sources of igraph_is_connected(), and the loop that interruptions are checked for is not tight. Look at this:

https://github.com/igraph/igraph/blob/master/src/connectivity/components.c#L523

For each interruption check, there's an igraph_neighbors() call, which is not cheap, and a whole other inner loop. If the interruption check is so slow that it dominates the timings even here, there's not much we can do. The interrupt check is obviously much too slow.

szhorvat avatar Nov 02 '23 17:11 szhorvat

I did a small benchmark using clock_gettime() and CLOCK_MONOTONIC_RAW on macOS. Using command line R or the R GUI, a single interruption check takes about 0.05 to 0.1 ms. With RStudio, it typically takes 0.8-1.5 ms, and regularly peaks into the 10 ms range. That's just not right.

szhorvat avatar Nov 02 '23 18:11 szhorvat

We have a workaround here. What's the proposed course of action? How much can the C core do to reduce the number of calls?

krlmlr avatar Feb 20 '24 14:02 krlmlr

How much can the C core do to reduce the number of calls?

I don't think this is feasible in the short term. The only good solution I can see is to consider functions case by case and try to make improvements. That's a slow, gradual process. As we discussed, I tried to use timing-based solutions, but I couldn't make it work well: timing functions are non-standard and their performance varies widely. There is nothing we can consistently rely on, trust that it works on most systems, and trust that it won't introduce a much worse performance issue on some exotic system.

I still think it's worth complaining to RStudio given that this is mostly an issue only with that system. It does not affect the Python or Mathematica interfaces, as those have much more performant interrupt checkers. It does affect plain R a little bit, but interruption checks with RStudio are over an order of magnitude slower than with plain R.


tl;dr Improvements are slow but ongoing in the C core. Keep the workaround in R for now. Bring this up with RStudio at some point.

szhorvat avatar Feb 20 '24 14:02 szhorvat

To bring this up with Posit/RStudio, a reproducible example would be nice.

krlmlr avatar Feb 20 '24 15:02 krlmlr

Can we, as a rule, offload all computation to another thread?

krlmlr avatar Mar 02 '24 15:03 krlmlr