kotlinx-io
kotlinx-io copied to clipboard
Consider alternative Sink.writeString implementations on JVM
On JVM, instead of reading each character separately and then encoding it to UTF-8 and writing to a buffer, it might be faster to:
- extract chars to a CharArray and then iterate over it;
- simply use toByteArray.
For other libraries, namely kotlinx.serialization, some of these approaches performed better. While quick ad-hoc experiments didn't show any pros for kotlinx-io, it does make sense to investigate it thoroughly.
Combination of String::toByteArray and UnsafeBufferOperations::moveToTail show better performance when it comes to strings whose chars could be encoded using same-length byte sequences. However, the current implementation significantly outperforms String::toByteArray-based approach on strings where characters require byte sequences of variadic lengths.
And, of course, String::toByteArray result in higher allocation rate.
In serialization, we leverage intrinsified String::getChars (pros: vectorized, much faster compact strings unpacking, no rangechecks) and also rely on the fact that our CharArrays are pooled, leading to no allocations.
For kotlinx-io, it seems like such an approach does not provide any significant performance improvements on average: https://github.com/Kotlin/kotlinx-io/blob/435acfb038ba6803692783b28e86b4148e0d5019/core/jvm/src/SinksJvm.kt#L147 https://jmh.morethan.io/?source=https://gist.githubusercontent.com/fzhinkin/a11a2ce595cadb8fba700cdbe18a6f4f/raw/fbb87909636731439aac80948fa023bcc10d4269/toCharArray-based-writeString.json
In some scenarios, performance is better, in others it's worse.