kotlinx-io
kotlinx-io copied to clipboard
Inconsistent JS/JVM behavior with byte order mark (BOM)
I'm not sure if this problem is still relevant as it looks like the function in question has been commented out of the current tree:
https://github.com/Kotlin/kotlinx-io/blame/master/core/commonMain/src/kotlinx/io/text/CharsetEncoder.kt
but in case it is still an issue under the covers - with the 0.1.16 version of the library, I'm seeing inconsistent behavior when calling String(<bytes>, charset = Charsets.UTF_8) when bytes begins with a Byte order mark depending on whether I'm targeting the JVM or JS.
In the JVM, the BOM (0xEF, 0xBB, 0xBF) gets converted to a U+FEFF as the first character of the resulting string.
In JS, the BOM appears to be stripped out.