aes-js Incorrect UTF-8 decoding

Incorrect UTF-8 decoding

Open swansontec opened this issue 5 years ago • 2 comments

The utf8.fromBytes routine does not handle 4-byte character sequences.

Demo

$ echo -n 𠜎 | hexdump
0000000 f0 a0 9c 8e

The character 𠜎 has a 4-byte encoding, so let's try putting that into fromBytes:

const aesjs = require('aes-js')

const bytes = [0xf0, 0xa0, 0x9c, 0x8e]
const string = aesjs.utils.utf8.fromBytes(bytes)
console.log(string)

Nothing prints. Doing it with buffer works as expected:

console.log(Buffer.from(bytes).toString()) // Prints 𠜎

Jan 03 '20 18:01 swansontec

Yes, I believe you are correct. I will be removing the UTF8 utilities in the next version of this library (soon, I hope) and recommending one of my other libraries for UTF8 coding, @ethersproject/strings (you can use the toUtf8Bytes and toUtf8String functions), which is far more robust.

I’ll pin this issue too, once I get to the coffee shop to work for the day. :)

Thanks!

Jan 03 '20 18:01 ricmoo

I was able to fix this by just using TextEncoder / Decoder standard ( https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder ):

var textBytes = new TextEncoder().encode(text); var decryptedText = new TextDecoder().decode(decryptedBytes);

There is also a polyfill in case you may want to add support for IE (linked from Mozilla): https://github.com/inexorabletash/text-encoding

I would like to ask if you, as the developer of AESJS deem this kind of use safe for your script. Thanks.

Jan 07 '20 20:01 roiconde

aes-js aes-js copied to clipboard

Incorrect UTF-8 decoding

Demo

aes-js
aes-js copied to clipboard