RBNF.jl
RBNF.jl copied to clipboard
Empty tokens can be matched infinitely
This might not be an issue.
If a token can match empty character(for example,regular expression r"a*"
, then runlexer
will match this token over and over again, push infinite tokens into the array and finally crash the computer.
It would be helpful to document this behavior, or provide some test functions to ensure that every token contains no empty input(this is doable for regular language), or provide a optional arguments of runlexer to limit the size of the token buffer. If so, testing the parser will be more easy.
Thank you for pointing out this, and I totally agree with you.
My bandwidth will be limited for quite a few weeks, so I wonder if you could help and make a patch PR?
Sorry for the late reply.
I spend some time digging into the code base. Here are some reports:
1.Julia implements regex expression by calling perl's library. So unless we implement a regular language parser manually, we can't analyze whether a regular language contains empty string. Although this is doable, Idon't think it's worth it.
2.So it's better to check the empty string at runtime, though it might slow down the parser. We can overload the LexerSpec{String}
and LexerSpec{Char}
constructor to let them include an empty check . So check happens only once. For regex, we just change this code:
function mklexer(a :: LexerSpec{Regex})
let regex = a.a
quote
function (chars, i)
v = match($regex, SubString(chars, i))
v === nothing ? nothing : v.match
end
end
end
end
to:
function mklexer(a :: LexerSpec{Regex})
let regex = a.a
quote
function (chars, i)
v = match($regex, SubString(chars, i))
if v === nothing
nothing
elseif isempty(v.match)
error("$regex matches an empty string, which will cause problems")
else
v.match
end
end
end
end
end
Of course, error messages are much harder to read... If you think this approach is OK, I will make a PR.
Hi, I must say your concerns are very reasonable, and make sense. Just wonder if you can make the runtime checking optional, because this is a rare behaviour and IMO will slow down the lexing a lot.