re2j Reduce the incidence of infinite loops while case folding

Reduce the incidence of infinite loops while case folding

Open mszabo-wikia opened this issue 3 years ago • 2 comments

dc7d6e5d41225dc0825ea6fe4c6055ff854abe13 unfortunately increases the incidence of infinite loops during case folding if re2j is running on a JVM newer than the version used to generate the bundled UnicodeTables.java and the input contains a rune that would require special case folding rules to form a closed fold loop. \u1C80 (Cyrillic Small Letter Rounded Ve) is an example of such a rune.

Workaround the issue by inverting the order of parameters passed to equalsIgnoreCase() so that the rune from the pattern being matched, rather than the input content, undergoes case folding instead. This does not fully eliminate the possibility of an infinite loop in this scenario, since the pattern may well contain one of the problematic runes, but it effectively restores the situation as it was pre dc7d6e5d41225dc0825ea6fe4c6055ff854abe13, since the previous logic also performed case folding on the rune from the pattern and not on the content.

Jul 19 '22 02:07 mszabo-wikia

Codecov Report

Merging #160 (cd47147) into master (9b3f052) will not change coverage. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #160   +/-   ##
=======================================
  Coverage   89.07%   89.07%           
=======================================
  Files          19       19           
  Lines        3038     3038           
  Branches      619      619           
=======================================
  Hits         2706     2706           
  Misses        189      189           
  Partials      143      143

Files Changed	Coverage Δ
java/com/google/re2j/Unicode.java	`68.75% <ø> (ø)`
java/com/google/re2j/Inst.java	`80.85% <100.00%> (ø)`

Jul 19 '22 02:07 codecov-commenter

I looked at https://github.com/google/re2j/pull/104 and it seems like the proper solution will be to generate this data on startup as noted in the comments there.

Jul 19 '22 02:07 mszabo-wikia

Thank you for the fix, I'm working on reducing/removing the need for separate Unicode tables in RE2J. In the meantime, I'll cut a release with this fix.

Aug 29 '23 21:08 sjamesr

Thanks!

Aug 29 '23 22:08 mszabo-wikia

re2j re2j copied to clipboard

Reduce the incidence of infinite loops while case folding

Codecov Report

re2j
re2j copied to clipboard