kotlinx.coroutines icon indicating copy to clipboard operation
kotlinx.coroutines copied to clipboard

Atomics used in coroutines are ~2x slower on write than AtomicReference on Android

Open ShikaSD opened this issue 2 years ago • 9 comments
trafficstars

The coroutines are using atomic reference implementation backed by Atomic*FieldUpdater which is 2x slower for compareAndSet and set when compared to AtomicReference on Android devices.

Running the benchmark on Pixel 4a I see the following results:

 29.0 ns  atomicReference_getAndSwap
 71.3 ns  atomicRef_getAndSwap
 50.7 ns  atomicReference_compareAndSet
135   ns  atomicRef_compareAndSet
  4.3 ns  atomicReference_get
  4.2 ns  atomicRef_get
 18.0 ns  atomicReference_lazySet
 79.4 ns  atomicRef_lazySet

In the benchmark above, AtomicReference is 2x to 4x faster on write than atomicfu. Looking at method traces from Compose benchmarks, I see that certain Job and CoroutineContext operations run multiple atomic operations depending on Job graph complexity.

You can see the trace from one of the abovementioned benchmarks here (it should be focused on coroutine work). It does not represent exact timing due to overhead of tracing every method, but does represent the amount of work being done. Most of it is dominated by Atomic*FieldUpdater operations and class instance checks that Atomic*FieldUpdater does.

As an experiment, I forked atomicfu to be backed by AtomicReference and got ~5% improvement in the same Compose benchmark, which is pretty significant given amount work Compose executes outside of coroutine context.

I filed a separate issue for Android internally to potentially fix this on the runtime side, but any fix will affect only relatively small subset of modern devices that support runtime updates.

ShikaSD avatar Nov 22 '23 20:11 ShikaSD

It looks like atomicfu supports generating VarHandle based references (which are used for AtomicReference implementation in Android). This implementation should be significantly faster than current Atomic*FieldUpdater one based on the benchmarks above.

Would it be possible to enable it by default for the jvm Android target (either by configuring kotlinx.coroutines build or by updating the default in atomicfu)?

ShikaSD avatar Jan 10 '24 17:01 ShikaSD

VarHandle was introduced in Java 9, but this library supports Java 8. Also, there's no "JVM Android target", just the JVM target, which Android also uses.

dkhalanskyjb avatar Jan 11 '24 10:01 dkhalanskyjb

Now that I think about it, maybe we could make a multi-release jar file that contained both the Java 8 implementation and the Java 9 one.

dkhalanskyjb avatar Jan 11 '24 10:01 dkhalanskyjb

My suggestion was to maybe introduce a separate Android target, but I guess it is not going to work for consumers that ship JVM-only as well. I wonder how well Android supports VarHandle going back a few versions now though, assuming limited support even for Java 8 at ~API 21, so maybe my assumption was incorrect.

ShikaSD avatar Jan 11 '24 21:01 ShikaSD

VarHandle is only supported on API 33 which is basically 0% of devices. Beyond that, it actually requires a new bytecode which will be rejected by older runtimes. This means you cannot even conditionally reference them as you would a regular new API (such as java.time, for example). So it will actually require a minimum SDK of 33 (or maybe 29 actually at which point you then could conditionally reference it). Android lacks a version of MR jars which would allow selecting an entirely new dex file based on runtime support, sadly.

It also notably ignores all MR jar classes, as the Java version such as 9, 10, and so on do not map directly to API levels (e.g., Java 8 APIs were added in 24, 26, 28, and some are still missing today).

It's a mess.

JakeWharton avatar Jan 11 '24 21:01 JakeWharton

Right, disregard my VarHandle suggestion then. I am still concerned that AtomicReference performance is way better though, so probably worth investigating how it works on previous versions of Android.

ShikaSD avatar Jan 11 '24 23:01 ShikaSD

As I suspected, older versions of Android use Unsafe to perform these operations.

ShikaSD avatar Jan 12 '24 12:01 ShikaSD

What is possible to do is to release a separate variant of the kotlinx-coroutines-core where transformations are not applied (it's a bit trickier probably because non-transformed code depends on atomicfu library), but we definitely do not want to make it the default version -- as it will be slower than the baseline on the rest of the targets

qwwdfsad avatar Jan 18 '24 16:01 qwwdfsad

I don't think the version without transformation is going to help here, as atomicfu implements references the same way by default.

ShikaSD avatar Jan 18 '24 19:01 ShikaSD