mobile
mobile copied to clipboard
[PM-9550] Fixes a mojibake bug in web authenticators in some multibyte locales
Fixes #3345
Make the implementation of the character conversion consistent with their consumers in the web app.
Type of change
- [x] Bug fix
- [ ] New feature development
- [ ] Tech debt (refactoring, code cleanup, dependency upgrades, etc)
- [ ] Build/deploy pipeline (DevOps)
- [ ] Other
Objective
This PR fixes a mojibake issue in the FIDO2 Web Authn.
- c.f. https://en.wikipedia.org/wiki/Mojibake
You can see the detailed description of the issue and and screenshots in the issue #3345. In short, the current implementation displays a wrong, unreadable text "WebAuthn ã�®èª�証"" where it should display "Web Authn の認証" when the user locale is Japanese.
You can also find a minimal reproducible example of the issue at https://github.com/bitwarden/mobile/issues/3345#issuecomment-2212340286.
I have actually tried reproducing the issue only in Japanese locale. In addition, theoretically it can happen in any locales whose localized text of Fido2AuthenticateWebAuthn message contains multibyte characters. You should be also able to reproduce the issue in an arbitrary locale by inserting a Unicode emoji into the text.
Code changes
This PR removes the extra character conversion steps from Bit.App.Utilities.AppHelpers::EncodeDataParameter.
This change makes the data encoding in the method consistent with the decoding at apps/web/src/connectors/captcha.ts#L50 and apps/web/src/connectors/webauthn.ts#L88.
Analysis of the issue in the original implementation
The issue in the following code in the original implementation is that it applies unnecessary encoding to multibyte characters before the essential part, and there is no corresponding decoding in the end-consumer of the return value.
public static string EncodeDataParameter(object obj)
{
string EncodeMultibyte(Match match)
{
return Convert.ToChar(Convert.ToUInt32($"0x{match.Groups[1].Value}", 16)).ToString();
}
var escaped = Uri.EscapeDataString(JsonConvert.SerializeObject(obj));
var multiByteEscaped = Regex.Replace(escaped, "%([0-9A-F]{2})", EncodeMultibyte);
return WebUtility.UrlEncode(Convert.ToBase64String(Encoding.UTF8.GetBytes(multiByteEscaped)));
}
Design of the current implementation
The intention of this original implementation is to convert the given set of query parameters for web authenticators (web apps) into a url-safe string. The generated strings are passed into the query parameters of the web authenticators (https://vault.bitwarden.com/webauthn-mobile-connector.html) or (https://vault.bitwarden.com/captcha-mobile-connector.html).
The callers of EncodeDataParameter, Bit.App.Pages.CaptchaProtectedViewModel.HandleCaptchaAsync and Bit.App.Pages.TwoFactorPageViewModel.Fido2AuthenticateAsync place some localized UI texts into the parameter obj. Therefore, the implementation of EncodeDataParameter needs to safely convert multibyte characters in the message into url-safe forms, and the conversion must be consistent with the reverse conversion in the consumers.
For example, obj has a field btnText whose value is "WebAuthn の認証" when TwoFactorPageViewModel.Fido2AuthenticateAsync calls the method in Japanese locale.
The value of btnText corresponds to the Fido2AuthenticateWebAuthn entry in the resource file.
What is actually happening behind the scene
Let's take a closer look at the first multibyte character in the btnText value, for example. The multibyte character is "の" (U+306E).
- c.f. https://en.wikipedia.org/wiki/No_(kana)
- Encoding into JSON preserves the character because ECMA-404 does not require character escapes in strings values except for
"(double quotation) or\(backslash). Uri.EscapeDataStringinternally callsSystem.UriHelper.EscapeStringToBuilderand it encodes the character into its UTF-8 byte sequence representation, and then individual bytes in the sequence are encoded into the form of "%xx".- The character U+306E is represented as 3 bytes of
0xE3, 0x81, 0xAEin UTF-8 - The byte sequence is then encoded into "%E3%81%AE"
- The character U+306E is represented as 3 bytes of
- The invocation of
Regex.Replacereplaces the individual occurrences of "%xx" form by callingEncodeMultibyteConvert.ToUInt32inEncodeMultibyteconverts the regex captures of "E3", "81", and "AE" into0xE3u,0x81u, and0xAEu, respectivelyConvert.ToCharconverts individual integer values into their corresponding unicode code points.0xE3u,0x81u, and0xAEuare converted into the following characters, respectively:ã(U+00E3, LATIN SMALL LETTER A WITH TILDE)- a non-printable control character (U+0081)
®(U+00AE, REGISTERED SIGN)
Encoding.UTF8.GetBytesreturns the UTF-8 byte sequence representation of the given string.- The characters
ã,, and ®are converted into their corresponding byte sequences as the followings:ã(U+00E3) --> 0xC3, 0xA3- U+0081 --> 0xC2, 0x81
®(U+00AE) --> 0xC2", 0xAE
- The characters
Convert.ToBase64Stringconverts the 6 bytes into "w6PCgcKu".
The consumer-side implementation in apps/web/src/connectors/captcha.ts#L50 or apps/web/src/connectors/webauthn.ts#L88 reverses just the step (5), (4), and (1). Therefore, we get ã® (U+00E3, U+0031, U+00AE) instead of "の" (U+306E).
Justification of the proposed fix
Just WebUtility.UrlEncode would be sufficient to keep the return value url-safe. It internally converts the given string into its UTF-8 byte sequence representation, and then convert the individual bytes into url-safe characters.
In practice, Base64 encoding is also necessary for consistency with the decoding in the end-consumer of the data.
Anyway, the earlier steps of (2), and (3) are not necessary in terms of the url-safety. Both of WebUtility.UrlEncode and Convert.ToBase64String(Encoding.UTF8.GetBytes(str)) are safe to multibyte characters. Actually, just (1), (4), (5) are what the consumer-side implementation reverses.
The steps (2) and (3) actually just make the encoded value incompatible to the implementation of the decoding. This is the reason why this PR removes the extra steps (2) and (3).
Comparison with ASCII characters.
You might wonder why this issue did not happen in earlier characters in the btnText string or any other messages in other many locales, e.g. in US English.
It is because the byte sequence representations in UTF-8 are equal to the Unicode codepoint values themselves for ASCII characters. Therefore, the combination of the extra steps (2) and (3) is effectively no-op for the characters.
For example, the character 'W' (U+0057) is represented as a 1-length sequence of bytes 0x57. The step (2) converts it into a string "%57". The EncodeMultibyte in the step (3) converts the regex capture of "57" into an integer 0x57u, and then Convert.ToChar converts it into a char value 'W' (U+0057).
The issue happens only when the strings in the target JSON object contains characters whose UTF-8 byte sequence is different from the Unicode codepoint itself.
Before you submit
- Please check for formatting errors (
dotnet format --verify-no-changes) (required) - Please add unit tests where it makes sense to do so (encouraged but not required)
- If this change requires a documentation update - notify the documentation team
- If this change has particular deployment requirements - notify the DevOps team
Thank you for your contribution! We've added this to our internal Community PR board for review. ID: PM-9550
This repo has been archived. If you would like to contribute code to the new native mobile apps, visit the iOS / Android repo.
Thank you for your contribution! We've added this to our internal tracking system for review. ID: PM-21270 Link: https://bitwarden.atlassian.net/browse/PM-21270
Details on our contribution process can be found here: https://contributing.bitwarden.com/contributing/pull-requests/community-pr-process.