Robin Leroy
Robin Leroy
> It looks like you need to either add these piecemeal to the `$NonOtherLetterIdeographs`, or else define that set as `[\p{Ideographic} - \p{gc=Lo}]` rather than testing that it's equal to...
> Please add the DoNotEmit data too. Done. If we are going to get much more of these kinds of proposals, we need to find a way to incorporate validation...
> More generally, are the diffs in lines related to QU intentional? No. Blame the author of #456. It looks like SegmenterDefault is correct, and SegmenterCldr is wrong. This should...
> will UTC and CLDR line break rules be the same starting with Unicode 16? They probably will be. But even if they were not, > Do we need a...
> We do have that test file in ICU: > https://github.com/unicode-org/icu/blob/main/icu4c/source/test/testdata/LineBreakTest.txt That is the file from the UCD, not a CLDR version. This one, generated by yours truly in the...
This seems reasonable in principle; but I will note that we do not publish the unicodetools version of the segmenter rules as part of the UCD, so a no-op there...
Re the invariant test failure, `$caseOverlap` is a manually-maintained set of 267 modifier letters and counting, it should probably be expressed in terms of properties instead.
See also https://github.com/unicode-org/unicodetools/issues/484.
> Hmmm. The description does include a link to a UTC decision, namely https://www.unicode.org/L2/L2023/23157.htm#176-C35, but the Pipeline / UTC decision check fails. Yeah this regex is too strict, it expects...
Yes, I think this is a fairly clear-cut case of ยซย diacritics are diacriticsย ยป. (For, in contrast, a case where the Diacritic property is not clear-cut, see the very...