lark
lark copied to clipboard
How can I force the parser to recognise a rule instead of another one, both included in a more global rule ?
Considering the following input string input_str = "C(x) :: B(x) & A(x)"
.
I have the following grammar :
my_grammar = u"""
start: expressions
expressions : (expression)+
?expression : my_rule
my_rule : head AFFECTATION (condition | body)
AFFECTATION : "::"
condition : composite_predicate "//" composite_predicate
?composite_predicate : "(" conjunction ")"
| predicate
?head : head_predicate
head_predicate : identifier "(" [ arguments ] ")"
?body : conjunction
conjunction : predicate (CONJUNCTION_SYMBOL predicate)*
CONJUNCTION_SYMBOL : "&" | "\N{LOGICAL AND}"
negated_predicate : ("~" | "\u00AC" ) predicate
predicate : int_ext_identifier PRED_L_PAR PRED_R_PAR
| int_ext_identifier PRED_L_PAR arguments PRED_R_PAR
| negated_predicate
| comparison
| "(" predicate ")"
PRED_L_PAR : "("
PRED_R_PAR : ")"
comparison : argument COMPARISON_OPERATOR argument
COMPARISON_OPERATOR : "==" | "<" | "<=" | ">=" | ">" | "!="
function_application_par : "(" function_application ")"
function_application : FUNC_L_PAR lambda_expression FUNC_R_PAR FUNC_L_PAR [ arguments ] FUNC_R_PAR -> lambda_application
| int_ext_identifier FUNC_L_PAR [ arguments ] FUNC_R_PAR -> id_application
FUNC_L_PAR : "("
FUNC_R_PAR : ")"
signed_int_ext_identifier : int_ext_identifier -> signed_int_ext_identifier
| "-" int_ext_identifier -> minus_signed_id
?int_ext_identifier : identifier
| ext_identifier
| lambda_expression
lambda_expression : "lambda" arguments ":" argument
arguments : argument ("," argument)*
argument : arithmetic_operation | DOTS
DOTS : "..."
arithmetic_operation : term
| term "+" arithmetic_operation
| term "-" arithmetic_operation
term : factor
| factor "*" term
| factor "/" term
factor : exponent
| factor "**" exponent
?exponent : literal | function_application | signed_int_ext_identifier | "(" argument ")" | function_application_par
?literal : number | text
ext_identifier : "@" identifier
identifier : cmd_identifier | identifier_regexp
identifier_regexp : IDENTIFIER_REGEXP
IDENTIFIER_REGEXP : "`" /[0-9a-zA-Z\/#%\._:-]+/ "`"
cmd_identifier : CMD_IDENTIFIER
CMD_IDENTIFIER : /\\b(?!\\bexists\\b)(?!\\b\\u2203\\b)(?!\\bEXISTS\\b)(?!\\bst\\b)(?!\\bans\\b)[a-zA-Z_][a-zA-Z0-9_]*\\b/
exists : EXISTS
EXISTS : "exists" | "\u2203" | "EXISTS"
text : TEXT
TEXT : DOUBLE_QUOTE /[a-zA-Z0-9 ]*/ DOUBLE_QUOTE
| SINGLE_QUOTE /[a-zA-Z0-9 ]*/ SINGLE_QUOTE
DOUBLE_QUOTE : "\\""
SINGLE_QUOTE : "'"
?number : integer | float
?integer : INT -> pos_int
| "-" INT -> neg_int
?float : FLOAT -> pos_float
| "-" FLOAT -> neg_float
WHITESPACE : /[\t ]+/
%import common.INT
%import common.FLOAT
%import common.NEWLINE
%ignore WHITESPACE
%ignore NEWLINE
"""
I parse the input string as follows :
json_parser = Lark(my_grammar, parser='lalr', debug=True)
jp = json_parser.parse(input_str)
I need to use LALR algorithm to be able to use the interactive mode of the parsing later on.
I get the following error message :
UnexpectedToken: Unexpected token Token('CONJUNCTION_SYMBOL', '&') at line 2, column 13.
Expected one of:
* COMPARISON_OPERATOR
In my understanding, the parser recognises well my_rule in expression. Then, in my_rule, body is well recognised, which leads to conjunction. And then, B(x) is well recognised as a predicate. In my understanding, the problem arises in predicate.
The parser should recognise B(x) as a predicate in the form of int_ext_identifier PRED_L_PAR arguments PRED_R_PAR
. Instead of this, it seems to consider B(x) as a comparison
, which explains why it expects to find a COMPARISON_OPERATOR instead of a CONJUNCTION_COMMA.
How can I force the parser to recognise, in predicate, a int_ext_identifier PRED_L_PAR arguments PRED_R_PAR
instead of a comparison
?
I tried to give a bigger priority to CONJUNCTION_SYMBOL as follows but nothing changed :
CONJUNCTION_SYMBOL.2 : "&" | "\N{LOGICAL AND}"
I have no clue about the approach to have.
Thank you in advance for your help.
EDIT :
If I use the input string A(x)::~B(x)
, I get the following error message :
UnexpectedToken: Unexpected token Token('$END', '') at line 2, column 11.
Expected one of:
* COMPARISON_OPERATOR