kotlinx-io icon indicating copy to clipboard operation
kotlinx-io copied to clipboard

Consider alternative Sink.writeString implementations on JVM

Open fzhinkin opened this issue 1 year ago • 3 comments
trafficstars

On JVM, instead of reading each character separately and then encoding it to UTF-8 and writing to a buffer, it might be faster to:

  • extract chars to a CharArray and then iterate over it;
  • simply use toByteArray.

For other libraries, namely kotlinx.serialization, some of these approaches performed better. While quick ad-hoc experiments didn't show any pros for kotlinx-io, it does make sense to investigate it thoroughly.

fzhinkin avatar May 07 '24 11:05 fzhinkin

Combination of String::toByteArray and UnsafeBufferOperations::moveToTail show better performance when it comes to strings whose chars could be encoded using same-length byte sequences. However, the current implementation significantly outperforms String::toByteArray-based approach on strings where characters require byte sequences of variadic lengths. And, of course, String::toByteArray result in higher allocation rate.

fzhinkin avatar Aug 27 '24 21:08 fzhinkin

In serialization, we leverage intrinsified String::getChars (pros: vectorized, much faster compact strings unpacking, no rangechecks) and also rely on the fact that our CharArrays are pooled, leading to no allocations.

qwwdfsad avatar Aug 28 '24 09:08 qwwdfsad

For kotlinx-io, it seems like such an approach does not provide any significant performance improvements on average: https://github.com/Kotlin/kotlinx-io/blob/435acfb038ba6803692783b28e86b4148e0d5019/core/jvm/src/SinksJvm.kt#L147 https://jmh.morethan.io/?source=https://gist.githubusercontent.com/fzhinkin/a11a2ce595cadb8fba700cdbe18a6f4f/raw/fbb87909636731439aac80948fa023bcc10d4269/toCharArray-based-writeString.json

In some scenarios, performance is better, in others it's worse.

fzhinkin avatar Aug 28 '24 18:08 fzhinkin