cpp-peglib
cpp-peglib copied to clipboard
Nondeterministic parsing failure in grammar
Hello! I'm encountering some bizarre nondeterministic parsing errors when using cpp-peglib
. I cannot reproduce the issue on another system, but I have a "minimal reproducer" which demonstrates the issue on my machine (I'll give machine details further down the issue).
Here's a small grammar which demonstrates the behavior on my machine:
Statement <- PrintStatement / PrintfStatement / PrintlnStatement / Expression
Expression <- Infix_Expression(Atom, Infix_Operator) / FunctionCall
Atom <- WriteableExpression / Prefix_Operator Atom / '(' Expression ')' / Number
FunctionCall <- VariableReference '(' ExpressionList ')'
ExpressionList <- Expression (',' Expression)*
PrintStatement <- 'print' '(' ExpressionList ')'
PrintfStatement <- 'printf' '(' ExpressionList ')'
PrintlnStatement <- 'println' '(' ExpressionList ')'
WriteableExpression <- VariableReference / PointerDereference
VariableReference <- Identifier
PointerDereference <- '*' WriteableExpression
Identifier <- < !Keyword Name >
Keyword <- ( 'true' | 'false' ) !Name
Name <- < [A-Za-z_][A-Za-z0-9_]* >
Number <- < [0-9]+ >
Prefix_Operator <- < [!-] >
Infix_Operator <- < ( '>=' | '<=' | '&&' | '||' | '==' | '!=' | '-' | '+' | '/' | '>' | '<' | '*' ) >
%whitespace <- [ \t]*
%word <- [A-Za-z]+
Infix_Expression(A, O) <- A (O A)* {
precedence
L && ||
L >= <= > < == !=
L + -
L * /
}
(Side note: I know this grammar is extremely flawed, it's not meant to actually be a good or useful grammar!)
On my machine, the longer the input is, the higher probability of an unexpected parsing fail. It is nondeterministic, so:
./parse "a + a + a + a" # gives a valid parse tree all of the time
./parse "a + a + a + a + a" # gives a valid parse tree about 20% of the time, but 80% of the time fails with "1:17: syntax error, unexpected 'a', expecting <Infix_Operator>."
./parse "a + a + a + a + a + a" # always fails
The grammar was tested in the online sandbox and worked as intended, without any of the failures. Additionally, the grammar was tested on a different machine (with the same version of cpp-peglib
and it also worked as intended.
Relevant information:
cpp-peglib version: up-to-date with master
as of May 7, 2024
compiler version: GNU g++ 11.4.0-1ubuntu1~22.04
environment: Ubuntu 22.04 podman container
Compiled with g++ -std=c++17 -lpthread source.cxx -o parse
source.cxx
doesn't do anything special-- it simply calls enable_ast
, enable_packrat_parsing
, and ast_to_s
. The packrat parsing does not seem to affect the bug.
Have you tested it with valgrind
?
I just did-- getting several Invalid read
errors.
@computablee sorry for the late reply. Could you show me the content of source.cxx
?
I'll close it until further information is given.