joni icon indicating copy to clipboard operation
joni copied to clipboard

how to interrupt hanging thread?

Open KnIfER opened this issue 5 years ago • 7 comments

call thread.interrupt() and nothing happened. so how to stop the hanging thread?

Charset _charset = Charset.forName("GB18030");
/* text containing irregular binary data will make thread hang */
Thread thread = new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			String key = "a"; // any character
			byte[] pattern = key.getBytes(_charset);

			Regex regex = new Regex(pattern, 0, pattern.length, Option.IGNORECASE, GB18030Encoding.INSTANCE);

			byte[] source = new byte[]{0x2f, 0x2f, (byte) 0xaf}; // text content.
			/* Encoded by GB18030, It reads "//�" where � means that "0xaf" is wrong or unsupported? */
			System.out.println(new String(source, _charset));

			Matcher matcher = regex.matcher(source);
			// search Interruptible ?
			int idx=matcher.searchInterruptible(0, source.length, Option.DEFAULT);
			System.out.println(idx+"");
		} catch (InterruptedException e) {
			System.out.println("InterruptedException");
			e.printStackTrace();
		}
	}
});

thread.start();

new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			Thread.sleep(500);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		System.out.println("interrupt !!! ");
		thread.interrupt();  // called but not working. 
	}
}).start();

v2.1.30

KnIfER avatar Jan 03 '20 13:01 KnIfER

Hmmm I would have expected that to work. Trying locally.

headius avatar Feb 10 '20 19:02 headius

Ok, so a bit of background here. The original intent of the interruptibility was to ensure long-running regular expressions could be stopped. By long running here we mean things like regular expressions with complex alternations... not regular expressions that get stuck because of bad character encoding.

This case, and likely many others, are not interruptible because they get stuck in a tight inner loop trying endlessly to advance a byte offset for a broken multibyte characters. In this case, the code get stuck here:

https://github.com/jruby/joni/blob/495de017b9a034f9f48dab891db06f1afb3f4b10/src/org/joni/Search.java#L204-L207

Adding an interrupt check here would add a lot of overhead to this search logic, so I'm reluctant to do that. We have discussed making joni more robust in the presence of bad character data, however, since there's several places like this that can get stuck.

So unfortunately right now there's no workaround to make interrupting work for your case. It should really be an error. We'll discuss a bit and see if we can figure out a good path forward.

headius avatar Feb 10 '20 22:02 headius

what about this ?

    int max = enc.maxLength();
    if(max==1) {
        while (s < end) {
            if (lowerCaseMatch(target, targetP, targetEnd, text, s, textEnd, enc, buf, regex.caseFoldFlag)) return s;
            s += enc.length(text, s, textEnd);
        }
    } else {
        int count = s;
        while (s < textRange) {
            if (lowerCaseMatch(target, targetP, targetEnd, text, s, textEnd, enc, buf, regex.caseFoldFlag)) return s;
            int val = enc.length(text, s, textEnd);
            if (val > 0) {
                s += val;
                count += val;
            } else {
                if (max + s + val < count) {
                    s += 1;
                    count = s;
                } else { //I believe it shouldn't lag too much behind ?
                    s += val;
                    count += 1;
                }
            }
        }
    }

besides SLOW_IC_FORWARD, I've also modified MAP_FORWARD,they are similiar. I wonder is it bad though.

KnIfER avatar Feb 11 '20 12:02 KnIfER

This looks like a decent change! We were discussing on the JRuby matrix how to fix this, and I believe a similar fix was suggested. @lopex What do you think?

headius avatar Feb 11 '20 13:02 headius

I'm reluctant to such changes since they will introduce inconsistencies and diverge us from onigmo even more. The problem is explained more thoroughly in https://github.com/jruby/jcodings/issues/26

lopex avatar Feb 11 '20 20:02 lopex

Also, https://github.com/jruby/jcodings/issues/26 shows that those infinite loops will exist in other places and it would not be ideal to randomly patch susceptible callsites.

lopex avatar Feb 11 '20 20:02 lopex

For this very case, a quick fix would be to add an unsafe GB18030 version in the same fashion https://github.com/jruby/jcodings/tree/unsafe-encoding aims to.

lopex avatar Feb 11 '20 20:02 lopex