kotlinx.coroutines Graceful degradation of spinlock-like implementations during internal errors or system malfunction

https://github.com/Kotlin/kotlinx.coroutines/issues/3613 uncovered the whole new class of bugs for coroutines: infinite CPU-intensive spin-loops during "unexpected" exceptions at the core of the implementation.

While this particular bug was addressed mechanically (#3634), the possibility of such bugs is still there:

StackOverflowError in system-level methods (this particular issue was addressed by https://openjdk.org/jeps/270 in Java for Java's primitives)
OutOfMemoryError from an arbitrary place of code that attempted an innocuous allocation
An arbitrary programmatic bug in our own implementation
Any other "implicit" exception (whether it's NPE during non-trivial data race, LinkageError due to misaligned dependency or thread death)

Being an application-level framework, it is close to impossible to ensure that coroutines continue to operate bugless and preserve all the internal invariants in the face of implicit exceptions being thrown from an arbitrary line of code, so the best we can do is to make the best effort (pun intended). What we should do is to ensure that prior to system collapse, it stays responsive (i.e. available for introspection with tools like jps) and graceful (i.e. it eventualy deadlocks instead of intensively burning CPU ~~and user's pocket~~).

In order to do that, all our spin-lock based solutions (which, contrary to many kernel-level APIs, spin in scenarios "it shouldn't take long" rather than "this one is totally safe, recoverable and interruption-safe") should degrade gracefully into sleep/yield/onSpinWait behaviour first and, as a last resort, to the full-blown thread parking later.

For now, we are aware of three such places:

Waiting for reusability token in DispatchedContinuation.awaitReusability that matches racy scenarios such as "suspend (T1) resume (T2) getResult() (T1)`
Waiting for ownership token of owner-supplied operation in Mutex
Waiting for logical expandBuffer operation in BufferedChannel

The key here is to ensure that the solution is robust enough (i.e. that when timings are really unlucky, the solution actually works and proceeds) and doesn't obstruct the fast-path (i.e. "happy path" performance is not affected)

Feb 21 '23 17:02 qwwdfsad

degrade gracefully into sleep/yield/onSpinWait behaviour first and, as a last resort, to the full-blown thread parking later

I would much rather use only onSpinWait, and past a certain spin count use park. Two threads calling yield in a loop can saturate the CPU without actually allowing whatever thread is supposed to be releasing them to run. Sleep is usually implemented by the same code as park, but is less efficient because it won't get woken up eagerly.

the solution actually works and proceeds) and doesn't obstruct the fast-path (i.e. "happy path" performance is not affected)

This will generally not be possible to do; supporting a non-spin waiting system requires use of a compare-and-set or get-and-set call which is more expensive than the current blind write. I would much rather have spin-free code than save a tiny number of cycles in this bit of infrastructure. https://abseil.io/docs/cpp/atomic_danger

Feb 21 '23 20:02 charlesmunger

Good point, yield indeed may increase CPU consumption in an unpredictable manner.

This will generally not be possible to do; supporting a non-spin waiting system requires use of a compare-and-set or get-and-set call which is more expensive than the current blind write

Right. Though the CAS on its own is unlikely to seriously contribute in the system performance (AFAIR uncontended CAS'es are pretty close to regular barrier'd writes on modern architectures) What I'd like to achieve is to ensure that randevouz logic (specifically, park/unpark pairs) do no interfere with regular code-paths, so the overall amortized operations throughput/latency is mostly unaffected.

Feb 22 '23 11:02 qwwdfsad

Following up on this - check out slide 56 of this presentation which shows AMD's recommendations, summarized as:

Don't spin, use mutexes
If you are going to spin anyway:
- Use the pause instruction (this is Thread.onSpinWait())
- Alignas(64) lock variable (you don't have much control over this)
- Test and test-and-set (this means to do a relaxed read, and then attempt a non-relaxed CAS based on the results)
- The OS may be unaware that threads are spinning; scheduling efficiency and battery life may be lost

I really think you should not have any unbounded spin sections. The tools that could be used for scheduler-cooperative spinlocking are not available in the JVM, and certainly not on all platforms.

Jan 19 '24 21:01 charlesmunger

kotlinx.coroutines kotlinx.coroutines copied to clipboard

Graceful degradation of spinlock-like implementations during internal errors or system malfunction

kotlinx.coroutines
kotlinx.coroutines copied to clipboard