unicodetools
unicodetools copied to clipboard
allkeys_CLDR.txt contains variable collation elements with an ignorable primary weight
From CLDR-17161 reported by Henry Stratmann
The version of allkeys.txt that is generated by the Unicode Tools Java code (rather than the C sifter code) generates some secondary CEs that are marked with a star for being “variable”. This is visible in CLDR 44 allkeys_CLDR.txt for the non-ASCII apostrophe and quotation mark characters which now have merely secondary distinctions from the ASCII ones. Look for [*0000.:
0027 ; [*022E.0020.0002] # APOSTROPHE
FF07 ; [*022E.0020.0003] # FULLWIDTH APOSTROPHE
2018 ; [*022E.0020.0004][*0000.011C.0004] # LEFT SINGLE QUOTATION MARK
2019 ; [*022E.0020.0004][*0000.011D.0004] # RIGHT SINGLE QUOTATION MARK
201A ; [*022E.0020.0004][*0000.011E.0004] # SINGLE LOW-9 QUOTATION MARK
201B ; [*022E.0020.0004][*0000.011F.0004] # SINGLE HIGH-REVERSED-9 QUOTATION MARK
05F3 ; [*022E.0020.0004][*0000.0124.0004] # HEBREW PUNCTUATION GERESH
2039 ; [*022F.0020.0002] # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
(same for the double quote variants)
Variable primaries (with a star) should not be zero: https://www.unicode.org/reports/tr10/#File_Format