Robin Leroy

Results 189 comments of Robin Leroy

> It looks like you need to either add these piecemeal to the `$NonOtherLetterIdeographs`, or else define that set as `[\p{Ideographic} - \p{gc=Lo}]` rather than testing that it's equal to...

> Please add the DoNotEmit data too. Done. If we are going to get much more of these kinds of proposals, we need to find a way to incorporate validation...

> More generally, are the diffs in lines related to QU intentional? No. Blame the author of #456. It looks like SegmenterDefault is correct, and SegmenterCldr is wrong. This should...

> will UTC and CLDR line break rules be the same starting with Unicode 16? They probably will be. But even if they were not, > Do we need a...

> We do have that test file in ICU: > https://github.com/unicode-org/icu/blob/main/icu4c/source/test/testdata/LineBreakTest.txt That is the file from the UCD, not a CLDR version. This one, generated by yours truly in the...

This seems reasonable in principle; but I will note that we do not publish the unicodetools version of the segmenter rules as part of the UCD, so a no-op there...

Re the invariant test failure, `$caseOverlap` is a manually-maintained set of 267 modifier letters and counting, it should probably be expressed in terms of properties instead.

See also https://github.com/unicode-org/unicodetools/issues/484.

> Hmmm. The description does include a link to a UTC decision, namely https://www.unicode.org/L2/L2023/23157.htm#176-C35, but the Pipeline / UTC decision check fails. Yeah this regex is too strict, it expects...

Yes, I think this is a fairly clear-cut case of ยซย diacritics are diacriticsย ยป. (For, in contrast, a case where the Diacritic property is not clear-cut, see the very...