Markus Scherer

Results 46 issues of Markus Scherer

When we were indirectly using CLDR UnicodeProperty, I made some improvements, intended to be performance optimizations for getting a UnicodeSet for the property=yes mappings: - https://github.com/unicode-org/cldr/pull/208/files#diff-55a84a7b970cb77290ac491e3ded85516f1d017374084b06b773d9a113adb7f6 Recover those into the...

Turning a UTC action item into a Unicode Tools issue. From the UTC-149 minutes: B.12.3.1 Working Draft of Proposed Update UAX #9, Unicode Bidirectional Algorithm [Iancu, L2/16-366] ... Further discussion...

Make the pseudorandom generation of IdnaTestV2.txt test cases "delicate", to make comparison of the test data file between versions less onerous. Was UTC action item 0-A343, originally intended for "After...

Add test cases for the five characters whose Decomposition_Mapping's were corrected in Unicode 4.0: - https://www.unicode.org/versions/corrigendum4.html - https://www.unicode.org/Public/UCD/latest/ucd/NormalizationCorrections.txt - “Normalization Changes (CJK Compatibility Characters)” in https://www.unicode.org/reports/tr46/#TableDerivationStep3 Include strings with both...

Background: ucd-dev email thread “[Unicode Tools vs. UnicodeProperty.java](https://groups.google.com/g/ucd-dev/c/H60xj1kTfCU)” We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools. Until 1.5 years ago they were identical,...

- [ ] generate IDNA files for 16.0 alpha - [ ] work with Ken to generate idna2008derived file for 16.0 alpha - [ ] add IDNA files to pub/copy-alpha-to-draft.sh...

For details see CLDR-17224. The CLDR/ICU design meeting group on 2023-dec-04 agreed that the CLDR+ICU root collation order in its radical-stroke version should sort Han characters like UAX38 does: ......

uca

We see some errors or typos in character proposals listing ccc values, sometimes proposing ccc=230 for marks-below. An invariant test could help catch those, if the character names follow the...

invariant

Regarding [UTC-177-C??] Consensus: Add a new normative, binary character property, Modifier_Combining_Mark (short name: MCM) to the UCD, for 16.0, based on L2/23-210. This new property is _intended to be_ [immutable](https://www.unicode.org/reports/tr44/#Property_Invariants)....

invariant

For the Age property, `[:Age=6.0:]` includes all characters that were encoded in 6.0 (or by 6.0), that is, characters whose Age property is *at most* 6.0. This is documented at...