joni icon indicating copy to clipboard operation
joni copied to clipboard

Valid UTF-8 input can cause infinite loop in JONI

Open haozhun opened this issue 10 years ago • 5 comments

In #7, @electrum identified a location that can cause inifinite loop in JONI. It is marked as won't fix because input can be sanitized beforehand and JONI assumes that the input is always valid.

When the pattern is "\uD8000", it can be pre-sanitized, as you suggested in #7. What if the pattern is "\\uD800"? How can the user sanitize it?

If JONI is willing to add a check, it would be the same fix for #7, checking whether the return value of enc.length is negative in OptExactInfo.concatStr.

haozhun avatar Mar 18 '15 22:03 haozhun

In addition, \uD800\uDC00, which is a legal sequence, will also result in infinite loop, because JONI consider every \uXXXX as a code point.

haozhun avatar Mar 26 '15 00:03 haozhun

@haozhun - can you show some jruby or java code that illustrates the endless loop?

guyboertje avatar Apr 26 '16 14:04 guyboertje

Note that in the past year we did add the ability to interrupt joni when it's stuck looping on bad input (or just large input/slow regex).

@haozhun Can you propose a patch? @lopex would probably be the best one to review such a change.

headius avatar May 02 '16 17:05 headius

Java code that illustrate the infinite loop. This can be mitigated by using NonStrict... instead as illustrated in the commented out code.

    public static void main(String[] args)
    {
        byte[] pattern = "A\\uD800".getBytes(StandardCharsets.UTF_8);
        byte[] str = ("AB").getBytes(StandardCharsets.UTF_8);
        Regex regex = new Regex(pattern, 0, pattern.length, Option.NEGATE_SINGLELINE, UTF8Encoding.INSTANCE, Syntax.Java);
        // Regex regex = new Regex(pattern, 0, pattern.length, Option.NEGATE_SINGLELINE, NonStrictUTF8Encoding.INSTANCE, Syntax.Java);
        Matcher matcher = regex.matcher(str);
        int result = matcher.search(0, str.length, Option.DEFAULT);
        System.out.println(result);
    }

Patch: https://github.com/jruby/joni/pull/21

haozhun avatar May 03 '16 20:05 haozhun

Ahh I see, this does not apply to JRuby (checked 1.7.24) because there is a range check.

raises RegexpError: invalid Unicode range: /A\uD800/

guyboertje avatar May 04 '16 09:05 guyboertje