re2j
re2j copied to clipboard
Reduce the incidence of infinite loops while case folding
dc7d6e5d41225dc0825ea6fe4c6055ff854abe13 unfortunately increases the incidence of infinite loops during case folding if re2j is running on a JVM newer than the version used to generate the bundled UnicodeTables.java and the input contains a rune that would require special case folding rules to form a closed fold loop. \u1C80 (Cyrillic Small Letter Rounded Ve) is an example of such a rune.
Workaround the issue by inverting the order of parameters passed to equalsIgnoreCase() so that the rune from the pattern being matched, rather than the input content, undergoes case folding instead. This does not fully eliminate the possibility of an infinite loop in this scenario, since the pattern may well contain one of the problematic runes, but it effectively restores the situation as it was pre dc7d6e5d41225dc0825ea6fe4c6055ff854abe13, since the previous logic also performed case folding on the rune from the pattern and not on the content.
Codecov Report
Merging #160 (cd47147) into master (9b3f052) will not change coverage. The diff coverage is
100.00%.
@@ Coverage Diff @@
## master #160 +/- ##
=======================================
Coverage 89.07% 89.07%
=======================================
Files 19 19
Lines 3038 3038
Branches 619 619
=======================================
Hits 2706 2706
Misses 189 189
Partials 143 143
| Files Changed | Coverage Δ | |
|---|---|---|
| java/com/google/re2j/Unicode.java | 68.75% <ø> (ø) |
|
| java/com/google/re2j/Inst.java | 80.85% <100.00%> (ø) |
I looked at https://github.com/google/re2j/pull/104 and it seems like the proper solution will be to generate this data on startup as noted in the comments there.
Thank you for the fix, I'm working on reducing/removing the need for separate Unicode tables in RE2J. In the meantime, I'll cut a release with this fix.
Thanks!