kotlinx.serialization icon indicating copy to clipboard operation
kotlinx.serialization copied to clipboard

Synchronize block cause performance drop

Open e5l opened this issue 5 months ago • 9 comments
trafficstars

Describe the bug The synchronized block in the char array pool appears to be a bottleneck on the concurrent serialization of many small objects to json:

https://github.com/Kotlin/kotlinx.serialization/blob/4667a1891a925dc9e3e10490c274a[…]n/jvmMain/src/kotlinx/serialization/json/internal/ArrayPools.kt

We observe this in the https://github.com/TechEmpower/FrameworkBenchmarks for Ktor using kotlinx.serialization

e5l avatar Jun 05 '25 11:06 e5l

Here's a per-thread profile of a tech empower's ktor benchmark: prof.zip

fzhinkin avatar Jun 09 '25 15:06 fzhinkin

Looking at existing kotlinx-serializaiton benchmarks, it seems like the overhead is negligible when more-or-less complex objects are serialized, but when it comes to simple objects like those from PrimitiveValuesBenchmark or responses from the TechEmpower's benchmark (https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Kotlin/ktor/ktor/src/main/kotlin/org/jetbrains/ktor/benchmarks/Models.kt), working with pool becomes the main activity for the serializer.

fzhinkin avatar Jun 09 '25 16:06 fzhinkin

Maybe it will be better if CharArray pool will be thread local? Or would it be too much from the memory footprint standpoint?

sandwwraith avatar Jun 10 '25 13:06 sandwwraith

It might be too much, although incorporating a thread local cache for a last accessed array might improve the situation.

fzhinkin avatar Jun 10 '25 14:06 fzhinkin

It might be too much, although incorporating a thread local cache for a last accessed array might improve the situation.

A complicating factor is that there is no crossplatform threadlocal support (yet), although both native and jvm/android targets have it and browser targets don't have shared memory.

pdvrieze avatar Jun 10 '25 18:06 pdvrieze

Lucky us (🤔), the reported problem is specific to JVM only (but it might make sense to support pooling on other targets as well).

fzhinkin avatar Jun 10 '25 19:06 fzhinkin

I'd say that this particular issue can be addressed with ConcurrentLinkedQueue + atomic around char array sizes (of course, with the loss of strictness around MAX_CHARS_IN_POOL)

qwwdfsad avatar Jun 16 '25 15:06 qwwdfsad

with ConcurrentLinkedQueue

IIRC, it was making things even worse

fzhinkin avatar Jun 16 '25 16:06 fzhinkin

If using threadlocal it could use the fact that we know that serialization/deserialization itself is singlethreaded. As such having per thread pools can work when non-synchronized access is given to the decoders when they are initially created by the format (which uses the threadlocal to get safe access).

pdvrieze avatar Jun 17 '25 08:06 pdvrieze