closure-library icon indicating copy to clipboard operation
closure-library copied to clipboard

goog.crypt.base64.decodeString fails on UTF-8 encoded string

Open nanaze opened this issue 10 years ago • 2 comments

This issue was imported from Closure Library's previous home at http://closure-library.googlecode.com

The link to the original issue is: https://code.google.com/p/closure-library/issues/detail?id=527

nanaze avatar Apr 15 '14 03:04 nanaze

Text from the original bug:

What steps will reproduce the problem?

  1. Use goog.crypt.base64.encodeString on a UTF8 encoded string or goog.crypt.base64.decodeString on a Base64-encoded UTF8 string.

What is the expected output? What do you see instead? The encoding/decoding is not done correctly. It looks like the base64 encode/decode are not using the utf8 byte array methods (see example below).

What version of the product are you using? On what operating system?

Please provide any additional information below.

Here's an example:

goog.require('goog.crypt.base64');

var utf8string = 'kūhl';

var base64 = goog.crypt.base64.encodeString(utf8string, true); console.log('base64: ' + base64); var decoded = goog.crypt.base64.decodeString(base64, true); console.log('decoded: ' + decoded);

var base64Correct = goog.crypt.base64.encodeByteArray( goog.crypt.stringToUtf8ByteArray(utf8string), true); console.log('base64Correct: ' + base64Correct); var correct = goog.crypt.utf8ByteArrayToString( goog.crypt.base64.decodeStringToByteArray(base64Correct, true)); console.log('correct: ' + correct);

joth76 avatar Nov 10 '15 01:11 joth76

The problem here is the example code is NOT actually passing utf8 string, it's passing a native JS string which is UCS-2 or utf16 or whatever. (The fact JS sourcecode maybe encoded as utf-8 is not important here: this is not how utf8string is viewed to the VM or library code).

So the second version of the code (base64Correct = goog.crypt.base64.encodeByteArray( goog.crypt.stringToUtf8ByteArray(utf8string)... etc) is indeed the correct way to get the desired result: first convert the string from the internal JS charset to utf-8, then base64 encode it.

Note that passing opt_webSafe=true to goog.crypt.base64.decodeString() is semi-masking the issue: passing false probably would send it down the "atob()" version of decodeString (where supported) and that correctly throws an encoding exception:

var utf8string = 'kūhl'; var base64 = goog.crypt.base64.encodeString(utf8string, false);

Uncaught DOMException: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.(…)

It's unfortunate that the websafe path (and encodeByteArray in general) does not validate that input bytes are in the valid 0-255 range, but throwing that exception could be a backward compat risk to change that now (all be it, a compatibility break for code that's already probably producing incorrect output)

joth76 avatar Nov 10 '15 02:11 joth76