kotlinx.coroutines
kotlinx.coroutines copied to clipboard
Atomics used in coroutines are ~2x slower on write than AtomicReference on Android
The coroutines are using atomic reference implementation backed by Atomic*FieldUpdater which is 2x slower for compareAndSet and set when compared to AtomicReference on Android devices.
Running the benchmark on Pixel 4a I see the following results:
29.0 ns atomicReference_getAndSwap
71.3 ns atomicRef_getAndSwap
50.7 ns atomicReference_compareAndSet
135 ns atomicRef_compareAndSet
4.3 ns atomicReference_get
4.2 ns atomicRef_get
18.0 ns atomicReference_lazySet
79.4 ns atomicRef_lazySet
In the benchmark above, AtomicReference is 2x to 4x faster on write than atomicfu. Looking at method traces from Compose benchmarks, I see that certain Job and CoroutineContext operations run multiple atomic operations depending on Job graph complexity.
You can see the trace from one of the abovementioned benchmarks here (it should be focused on coroutine work). It does not represent exact timing due to overhead of tracing every method, but does represent the amount of work being done. Most of it is dominated by Atomic*FieldUpdater operations and class instance checks that Atomic*FieldUpdater does.
As an experiment, I forked atomicfu to be backed by AtomicReference and got ~5% improvement in the same Compose benchmark, which is pretty significant given amount work Compose executes outside of coroutine context.
I filed a separate issue for Android internally to potentially fix this on the runtime side, but any fix will affect only relatively small subset of modern devices that support runtime updates.
It looks like atomicfu supports generating VarHandle based references (which are used for AtomicReference implementation in Android). This implementation should be significantly faster than current Atomic*FieldUpdater one based on the benchmarks above.
Would it be possible to enable it by default for the jvm Android target (either by configuring kotlinx.coroutines build or by updating the default in atomicfu)?
VarHandle was introduced in Java 9, but this library supports Java 8. Also, there's no "JVM Android target", just the JVM target, which Android also uses.
Now that I think about it, maybe we could make a multi-release jar file that contained both the Java 8 implementation and the Java 9 one.
My suggestion was to maybe introduce a separate Android target, but I guess it is not going to work for consumers that ship JVM-only as well. I wonder how well Android supports VarHandle going back a few versions now though, assuming limited support even for Java 8 at ~API 21, so maybe my assumption was incorrect.
VarHandle is only supported on API 33 which is basically 0% of devices. Beyond that, it actually requires a new bytecode which will be rejected by older runtimes. This means you cannot even conditionally reference them as you would a regular new API (such as java.time, for example). So it will actually require a minimum SDK of 33 (or maybe 29 actually at which point you then could conditionally reference it). Android lacks a version of MR jars which would allow selecting an entirely new dex file based on runtime support, sadly.
It also notably ignores all MR jar classes, as the Java version such as 9, 10, and so on do not map directly to API levels (e.g., Java 8 APIs were added in 24, 26, 28, and some are still missing today).
It's a mess.
Right, disregard my VarHandle suggestion then. I am still concerned that AtomicReference performance is way better though, so probably worth investigating how it works on previous versions of Android.
As I suspected, older versions of Android use Unsafe to perform these operations.
What is possible to do is to release a separate variant of the kotlinx-coroutines-core where transformations are not applied (it's a bit trickier probably because non-transformed code depends on atomicfu library), but we definitely do not want to make it the default version -- as it will be slower than the baseline on the rest of the targets
I don't think the version without transformation is going to help here, as atomicfu implements references the same way by default.