unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

UTS46 IdnaTestV2.txt: add 5 normalization corrections

Open markusicu opened this issue 1 year ago • 0 comments

Add test cases for the five characters whose Decomposition_Mapping's were corrected in Unicode 4.0:

  • https://www.unicode.org/versions/corrigendum4.html
  • https://www.unicode.org/Public/UCD/latest/ucd/NormalizationCorrections.txt
  • “Normalization Changes (CJK Compatibility Characters)” in https://www.unicode.org/reports/tr46/#TableDerivationStep3

Include strings with both the actual characters and their Punycode forms. For example, test with both

  • \U0002F9BF.com
  • xn--8c3n.com

As @hsivonen found, for these five characters it makes a difference whether the UTS46 implementation leaves them in the input until normalization (as the spec says), or whether disallowed+mapping+normalization treats them like any other disallowed character (like ICU does).

The characters should be normalized to valid ones, while when they occur inside Punycode they are disallowed.

See https://util.unicode.org/UnicodeJsps/idna.jsp?a=%5CU0002F9BF.com%0D%0Axn--8c3n.com%0D%0Axn--gro.com

@macchiati @eggrobin

markusicu avatar Feb 06 '24 23:02 markusicu