unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

Better testing for U+0CC2 and U+0DCF collation edge cases

Open hsivonen opened this issue 8 months ago • 0 comments

Two characters in the root collation have the special property that they occur in the middle of a contraction without also occurring at the start of a contraction. Therefore, checking if U+0CC2 and U+0DCF may start a contraction isn't a sufficient check for whether they can contract the next character.

This special case is worthwhile to test for explicitly, since this special case may cause a bug when skipping over the identical prefix of strings to be compared in a collator.

For U+0CC2, I suggest manually injecting the following into the collation test suite: 0CC8 0CC6 0CC2 0CD6 is less than 0CC8 0CC6 0CC2 0CD5

Here the inital 0CC8 is any filler character just in case to make the interesting case not occur right at the start of the input.

A similar case can probably be constructed for U+0DCF.

hsivonen avatar Apr 22 '25 07:04 hsivonen