kotlinx.serialization
kotlinx.serialization copied to clipboard
Synchronize block cause performance drop
Describe the bug The synchronized block in the char array pool appears to be a bottleneck on the concurrent serialization of many small objects to json:
We observe this in the https://github.com/TechEmpower/FrameworkBenchmarks for Ktor using kotlinx.serialization
Here's a per-thread profile of a tech empower's ktor benchmark: prof.zip
Looking at existing kotlinx-serializaiton benchmarks, it seems like the overhead is negligible when more-or-less complex objects are serialized, but when it comes to simple objects like those from PrimitiveValuesBenchmark or responses from the TechEmpower's benchmark (https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Kotlin/ktor/ktor/src/main/kotlin/org/jetbrains/ktor/benchmarks/Models.kt), working with pool becomes the main activity for the serializer.
Maybe it will be better if CharArray pool will be thread local? Or would it be too much from the memory footprint standpoint?
It might be too much, although incorporating a thread local cache for a last accessed array might improve the situation.
It might be too much, although incorporating a thread local cache for a last accessed array might improve the situation.
A complicating factor is that there is no crossplatform threadlocal support (yet), although both native and jvm/android targets have it and browser targets don't have shared memory.
Lucky us (🤔), the reported problem is specific to JVM only (but it might make sense to support pooling on other targets as well).
I'd say that this particular issue can be addressed with ConcurrentLinkedQueue + atomic around char array sizes (of course, with the loss of strictness around MAX_CHARS_IN_POOL)
with
ConcurrentLinkedQueue
IIRC, it was making things even worse
If using threadlocal it could use the fact that we know that serialization/deserialization itself is singlethreaded. As such having per thread pools can work when non-synchronized access is given to the decoders when they are initially created by the format (which uses the threadlocal to get safe access).