unicodetools
unicodetools copied to clipboard
UTS46 IdnaTestV2.txt: add 5 normalization corrections
Add test cases for the five characters whose Decomposition_Mapping's were corrected in Unicode 4.0:
- https://www.unicode.org/versions/corrigendum4.html
- https://www.unicode.org/Public/UCD/latest/ucd/NormalizationCorrections.txt
- “Normalization Changes (CJK Compatibility Characters)” in https://www.unicode.org/reports/tr46/#TableDerivationStep3
Include strings with both the actual characters and their Punycode forms. For example, test with both
- \U0002F9BF.com
- xn--8c3n.com
As @hsivonen found, for these five characters it makes a difference whether the UTS46 implementation leaves them in the input until normalization (as the spec says), or whether disallowed+mapping+normalization treats them like any other disallowed character (like ICU does).
The characters should be normalized to valid ones, while when they occur inside Punycode they are disallowed.
See https://util.unicode.org/UnicodeJsps/idna.jsp?a=%5CU0002F9BF.com%0D%0Axn--8c3n.com%0D%0Axn--gro.com
@macchiati @eggrobin