Inconsistent JS/JVM behavior with byte order mark (BOM)

Open jpd236 opened this issue 5 years ago • 0 comments

I'm not sure if this problem is still relevant as it looks like the function in question has been commented out of the current tree:

https://github.com/Kotlin/kotlinx-io/blame/master/core/commonMain/src/kotlinx/io/text/CharsetEncoder.kt

but in case it is still an issue under the covers - with the 0.1.16 version of the library, I'm seeing inconsistent behavior when calling String(<bytes>, charset = Charsets.UTF_8) when bytes begins with a Byte order mark depending on whether I'm targeting the JVM or JS.

In the JVM, the BOM (0xEF, 0xBB, 0xBF) gets converted to a U+FEFF as the first character of the resulting string.

In JS, the BOM appears to be stripped out.

May 23 '20 00:05 jpd236