regexp2
regexp2 copied to clipboard
Infinite match loop
The following test results in an infinite loop.
func TestOverlappingMatch(t *testing.T) {
re := MustCompile(`((?:0*)+?(?:.*)+?)?`, 0)
match, err := re.FindStringMatch("0\xfd")
if err != nil {
t.Fatal(err)
}
for match != nil {
t.Logf("start: %d, length: %d", match.Index, match.Length)
match, err = re.FindNextMatch(match)
if err != nil {
t.Fatal(err)
}
}
}
$ go test -v -run TestOverlappingMatch === RUN TestOverlappingMatch TestOverlappingMatch: regexp_test.go:802: start: 0, length: 2 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 ....
This one is a head-scratcher. I've confirmed it happens in the .NET regex engine that regexp2 is based on.
During the second match it seems like the regex stack is getting out of sync, so when the Capturemark operation tries to pop the start of the capture it it should get a 2 but there's an extra 1 left over from a previous Lazybranchmark operation. It seems the Lazybranchmark operation is setting up the stack for a backtrack that isn't guaranteed to happen... and if it doesn't happen then the stack has extra items on it.
So, more digging is needed to figure out the best way to resolve the issue. Can Lazybranchmark detect if there's a backtrack ahead? How often does the stack get out of sync? Can Capturemark detect the problem and pop extra items from the stack?
I'll need to do more research and digging, but wanted to capture my research and thoughts so far.
Maybe report this to Microsoft and let them fix it 😄
Worth a shot https://github.com/dotnet/runtime/issues/43314