regexp2 icon indicating copy to clipboard operation
regexp2 copied to clipboard

Infinite match loop

Open dop251 opened this issue 5 years ago • 3 comments

The following test results in an infinite loop.

func TestOverlappingMatch(t *testing.T) {
	re := MustCompile(`((?:0*)+?(?:.*)+?)?`, 0)
	match, err := re.FindStringMatch("0\xfd")
	if err != nil {
		t.Fatal(err)
	}
	for match != nil {
		t.Logf("start: %d, length: %d", match.Index, match.Length)
		match, err = re.FindNextMatch(match)
		if err != nil {
			t.Fatal(err)
		}
	}
}

$ go test -v -run TestOverlappingMatch === RUN TestOverlappingMatch TestOverlappingMatch: regexp_test.go:802: start: 0, length: 2 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 ....

dop251 avatar Oct 08 '20 12:10 dop251

This one is a head-scratcher. I've confirmed it happens in the .NET regex engine that regexp2 is based on.

During the second match it seems like the regex stack is getting out of sync, so when the Capturemark operation tries to pop the start of the capture it it should get a 2 but there's an extra 1 left over from a previous Lazybranchmark operation. It seems the Lazybranchmark operation is setting up the stack for a backtrack that isn't guaranteed to happen... and if it doesn't happen then the stack has extra items on it.

So, more digging is needed to figure out the best way to resolve the issue. Can Lazybranchmark detect if there's a backtrack ahead? How often does the stack get out of sync? Can Capturemark detect the problem and pop extra items from the stack?

I'll need to do more research and digging, but wanted to capture my research and thoughts so far.

dlclark avatar Oct 12 '20 16:10 dlclark

Maybe report this to Microsoft and let them fix it 😄

dop251 avatar Oct 12 '20 16:10 dop251

Worth a shot https://github.com/dotnet/runtime/issues/43314

dlclark avatar Oct 12 '20 17:10 dlclark