architecture icon indicating copy to clipboard operation
architecture copied to clipboard

Internationalization: Using unicode normalization when uniqueness/exact match atters

Open ties opened this issue 6 years ago • 0 comments

Context

I was working on an WebAutn authentication provider and noticed that the standard authentication provider does not use unicode normalization when comparing things that need to match exactly/need to be unique (usernames/password).

This is a recommendation is a (NIST) recommendation that prevents errors when the same unicode string is represented in multiple ways.

I implemented unicode normalization in my authentication provider but feel that it is an anti-pattern to duplicate password-hashing code. I have tests for my implementation.

Proposal

Apply NFKC normalization to strings that need to be checked for uniqueness or used in hashes. At a minimum this includes user-entered names, usernames, and passwords.

This follows section 5.1.1.2 in NIST sp800-63b

Consequences

In rare cases, two usernames that appear distinct but are not distinct may conflict. In rare cases, passwords may not match when they are ambiguous. In that case, the exact unicode encoding would depend on the browser at the moment.

Example from unicode.org:

# given that unicode_normalize is a function that applies NFKC normalization
assert "A\u0308ffin" == unicode_normalize("Äffin")
assert "A\u0308ffin" == unicode_normalize("Ä\uFB03n")

ties avatar Oct 14 '19 08:10 ties