lark icon indicating copy to clipboard operation
lark copied to clipboard

How can I force the parser to recognise a rule instead of another one, both included in a more global rule ?

Open lalshikh opened this issue 10 months ago • 0 comments

Considering the following input string input_str = "C(x) :: B(x) & A(x)".

I have the following grammar :

my_grammar = u"""
start: expressions
expressions : (expression)+

?expression : my_rule

my_rule : head AFFECTATION (condition | body)
AFFECTATION : "::"

condition : composite_predicate "//" composite_predicate
?composite_predicate : "(" conjunction ")"
                        | predicate

?head : head_predicate
head_predicate : identifier "(" [ arguments ] ")"

?body : conjunction
conjunction : predicate (CONJUNCTION_SYMBOL predicate)*
CONJUNCTION_SYMBOL : "&" | "\N{LOGICAL AND}"

negated_predicate : ("~" | "\u00AC" ) predicate
predicate : int_ext_identifier PRED_L_PAR PRED_R_PAR
          | int_ext_identifier PRED_L_PAR arguments PRED_R_PAR
          | negated_predicate
          | comparison
          | "(" predicate ")"
PRED_L_PAR : "("
PRED_R_PAR : ")"

comparison : argument COMPARISON_OPERATOR argument
COMPARISON_OPERATOR : "==" | "<" | "<=" | ">=" | ">" | "!="

function_application_par : "(" function_application ")"
function_application : FUNC_L_PAR lambda_expression FUNC_R_PAR FUNC_L_PAR [ arguments ] FUNC_R_PAR -> lambda_application
                     | int_ext_identifier FUNC_L_PAR [ arguments ] FUNC_R_PAR        -> id_application
FUNC_L_PAR : "("
FUNC_R_PAR : ")"

signed_int_ext_identifier : int_ext_identifier     -> signed_int_ext_identifier
                          | "-" int_ext_identifier -> minus_signed_id
?int_ext_identifier : identifier
                    | ext_identifier
                    | lambda_expression

lambda_expression : "lambda" arguments ":" argument

arguments : argument ("," argument)*
argument : arithmetic_operation | DOTS
DOTS : "..."

arithmetic_operation : term
                     | term "+" arithmetic_operation
                     | term "-" arithmetic_operation
term : factor
     | factor "*" term
     | factor "/" term
factor : exponent
       | factor "**" exponent
?exponent : literal | function_application | signed_int_ext_identifier | "(" argument ")" | function_application_par

?literal : number | text

ext_identifier : "@" identifier

identifier : cmd_identifier | identifier_regexp
identifier_regexp : IDENTIFIER_REGEXP
IDENTIFIER_REGEXP : "`" /[0-9a-zA-Z\/#%\._:-]+/ "`"

cmd_identifier : CMD_IDENTIFIER
CMD_IDENTIFIER : /\\b(?!\\bexists\\b)(?!\\b\\u2203\\b)(?!\\bEXISTS\\b)(?!\\bst\\b)(?!\\bans\\b)[a-zA-Z_][a-zA-Z0-9_]*\\b/

exists : EXISTS
EXISTS : "exists" | "\u2203" | "EXISTS"

text : TEXT
TEXT : DOUBLE_QUOTE /[a-zA-Z0-9 ]*/ DOUBLE_QUOTE
     | SINGLE_QUOTE /[a-zA-Z0-9 ]*/ SINGLE_QUOTE
DOUBLE_QUOTE : "\\""
SINGLE_QUOTE : "'"

?number : integer | float
?integer : INT     -> pos_int
         | "-" INT -> neg_int
?float : FLOAT     -> pos_float
       | "-" FLOAT -> neg_float


WHITESPACE : /[\t ]+/

%import common.INT
%import common.FLOAT
%import common.NEWLINE

%ignore WHITESPACE
%ignore NEWLINE
"""

I parse the input string as follows :

json_parser = Lark(my_grammar, parser='lalr', debug=True)
jp = json_parser.parse(input_str)

I need to use LALR algorithm to be able to use the interactive mode of the parsing later on.

I get the following error message :

UnexpectedToken: Unexpected token Token('CONJUNCTION_SYMBOL', '&') at line 2, column 13.
Expected one of: 
	* COMPARISON_OPERATOR

In my understanding, the parser recognises well my_rule in expression. Then, in my_rule, body is well recognised, which leads to conjunction. And then, B(x) is well recognised as a predicate. In my understanding, the problem arises in predicate.

The parser should recognise B(x) as a predicate in the form of int_ext_identifier PRED_L_PAR arguments PRED_R_PAR. Instead of this, it seems to consider B(x) as a comparison, which explains why it expects to find a COMPARISON_OPERATOR instead of a CONJUNCTION_COMMA.

How can I force the parser to recognise, in predicate, a int_ext_identifier PRED_L_PAR arguments PRED_R_PAR instead of a comparison ?

I tried to give a bigger priority to CONJUNCTION_SYMBOL as follows but nothing changed :

CONJUNCTION_SYMBOL.2 : "&" | "\N{LOGICAL AND}"

I have no clue about the approach to have.

Thank you in advance for your help.

EDIT :

If I use the input string A(x)::~B(x), I get the following error message :

UnexpectedToken: Unexpected token Token('$END', '') at line 2, column 11.
Expected one of: 
	* COMPARISON_OPERATOR

lalshikh avatar Aug 17 '23 00:08 lalshikh