RBNF.jl icon indicating copy to clipboard operation
RBNF.jl copied to clipboard

Empty tokens can be matched infinitely

Open ChenNingCong opened this issue 5 years ago • 3 comments

This might not be an issue. If a token can match empty character(for example,regular expression r"a*", then runlexer will match this token over and over again, push infinite tokens into the array and finally crash the computer. It would be helpful to document this behavior, or provide some test functions to ensure that every token contains no empty input(this is doable for regular language), or provide a optional arguments of runlexer to limit the size of the token buffer. If so, testing the parser will be more easy.

ChenNingCong avatar Feb 14 '20 11:02 ChenNingCong

Thank you for pointing out this, and I totally agree with you.

My bandwidth will be limited for quite a few weeks, so I wonder if you could help and make a patch PR?

thautwarm avatar Feb 14 '20 12:02 thautwarm

Sorry for the late reply. I spend some time digging into the code base. Here are some reports: 1.Julia implements regex expression by calling perl's library. So unless we implement a regular language parser manually, we can't analyze whether a regular language contains empty string. Although this is doable, Idon't think it's worth it. 2.So it's better to check the empty string at runtime, though it might slow down the parser. We can overload the LexerSpec{String} and LexerSpec{Char} constructor to let them include an empty check . So check happens only once. For regex, we just change this code:

function mklexer(a :: LexerSpec{Regex})
    let regex = a.a
        quote
            function (chars, i)
                v = match($regex, SubString(chars, i))
                v === nothing ? nothing : v.match
            end
        end
    end
end

to:

function mklexer(a :: LexerSpec{Regex})
    let regex = a.a
        quote
            function (chars, i)
                v = match($regex, SubString(chars, i))
                if v === nothing
                    nothing 
                elseif isempty(v.match)
                    error("$regex matches an empty string, which will cause problems") 
                else
                    v.match
                end
            end
        end
    end
end

Of course, error messages are much harder to read... If you think this approach is OK, I will make a PR.

ChenNingCong avatar Feb 17 '20 10:02 ChenNingCong

Hi, I must say your concerns are very reasonable, and make sense. Just wonder if you can make the runtime checking optional, because this is a rare behaviour and IMO will slow down the lexing a lot.

thautwarm avatar Feb 17 '20 11:02 thautwarm