packcc
packcc copied to clipboard
Lookahead woes and more
I have been trying to write a small PEG to parse the following in a file named simple.sa
:
42,
a = 42,
a = b + c * d,
a.b,
a.b.c,
a.b(1, 2),
42.b,
4711.b(),
a[666],
a[777].foo
It works except for the last a[777].foo.
At times I think there must be a bug in packcc's lookahead pattern support and then I realize it must be me misunderstanding something central. I wish there was more documentation on how to successfully use positive and negative lookahead. I'm stumped. Can anyone have mercy on me and point out what I need to do to the PEG below to work as expected?
Cheers /J
# File: simple.peg
# Test: packcc simple.peg && gcc -o simple simple.c && ./simple < simple.sa
%prefix "satie"
%earlysource {
static const char *dbg_str[] = { "Evaluating rule", "Matched rule", "Abandoning rule" };
#define PCC_DEBUG(auxil, event, rule, level, pos, buffer, length) \
fprintf(stderr, "%*s%s %s @%zu [%.*s]\n", (int)((level) * 2), "", dbg_str[event], rule, pos, (int)(length), buffer)
}
Program <- _ TopLevelExpr (_ "," _ TopLevelExpr)* EOF
TopLevelExpr <- Binding / Expr
Binding <- MatchPattern _ "=" _ Expr
MatchPattern <- Literal / FieldAccess / Symbol
Expr <- Add
Add <- Multiplicate (_ "+" _ Multiplicate)*
Multiplicate <- Indexing (_ "*" _ Indexing)*
Indexing <- Symbol _ "[" _ Expr _ "]" / FunctionCall
FunctionCall <- Symbol _ "(" _ ExprSequence? _ ")" / FieldAccess
FieldAccess <- HasField (_ "." _ HasField)* / Primary
HasField <- Symbol !(_ ("(" / "[")) / Literal / Indexing / FunctionCall
Primary <- Literal / Symbol
ExprSequence <- Expr (_ "," _ Expr)*
Literal <- NumberLiteral
NumberLiteral <- [0-9]+
Symbol <- [a-zA-Z_][a-zA-Z_0-9_]*
_ <- WS*
WS <- [ \t\r\n]
EOF <- _ !.
%%
int main() {
satie_context_t *context = satie_create(NULL);
satie_parse(context, NULL);
satie_destroy(context);
return 0;
}
I don't think it is a problem related to lookahead...
The a[777].foo
correctly matches FunctionCall
rule, but it doesn't match Indexing
. That is partially because Indexing
tries to first match only index access, and it never gets to FunctionCall
. Switching the order will help: Indexing <- FunctionCall / Symbol _ "[" _ Expr _ "]"
. But this is not the only problem, which makes it much harder to debug, because even if you fix one problem, it still doesn't work :slightly_smiling_face:
I'd suggest to rewrite the grammar slightly to separate the index access from field access. Currently they're kind of intertwined (HasField
contains Indexing
). I can't tell you how exactly, I'd have to give it much more time than I have available, sorry.
Thanks for the input. I can see now that my grammar snippet is ambiguous/faulty and need to be rewritten. From a completelly naively point of view I would like to have an even more ambiguous grammar though:
Program <- _ TopLevelExpr (_ "," _ TopLevelExpr)* EOF
TopLevelExpr <- Expr / Binding
# Precedence ordered
Expr <- Add
Add <- Multiplicate (_ "+" _ Multiplicate)*
Multiplicate <- Indexing (_ "*" _ Indexing)*
Indexing <- Expr _ "[" _ Expr _ "]" / FunctionCall
FunctionCall <- FunctionName _ "(" _ ExprSequence? _ ")" / FieldAccess
FieldAccess <- Expr (_ "." _ Expr)* / Primary
FunctionName <- Symbol
Symbol <- [a-zA-Z_][a-zA-Z_0-9_]*
ExprSequence <- Expr (_ "," _ Expr)*
Primary <- Literal / Symbol
Literal <- NumberLiteral
NumberLiteral <- [0-9]+
Binding <- MatchPattern _ "=" _ Expr
MatchPattern <- Literal / FieldAccess / Symbol
_ <- WS*
WS <- [ \t\r\n]
EOF <- _ !.
:-)
It would require a parser with indefinite backtracking. Maybe not even that would suffice. Life is hard.
Thanks