lark
lark copied to clipboard
Empty matches appearing unnecessarily when repeating empty rules ambiguously
I'm writing a simple assembly-like language and I'm using Lark to parse it's AST, but I'm having trouble with ambiguity. Here's a boiled-down MinRe:
grammar ="""
program: statement* // zero or more statements
?statement: instruction
instruction: pneumonic [parameter]* // match zero or more expected parameters
pneumonic: CNAME
parameter: ESCAPED_STRING | INT
SPACING: /[ \t\f]+/
%ignore SPACING
%import common.CNAME
%import common.ESCAPED_STRING
%import common.INT
"""
asm_parser = Lark(grammar, start="program", ambiguity="explicit")
example = """PNEUMONIC "text" 10"""
syntax_tree = asm_parser.parse(example)
print(syntax_tree.pretty())
The above produces this output, which shows there's some phantom parameter between the pneumonic and the explicit strings and ints:
program
_ambig
instruction
pneumonic PNEUMONIC
parameter "text"
parameter 10
instruction
pneumonic PNEUMONIC
None
parameter "text"
parameter 10
At first, I thought this was the parser matching the whitespace between the pneumonic and the first parameter, but removing this whitespace doesn't seem to help (especially since this whitespace is seemingly ignored anyway):
example = """PNEUMONIC"text" 10""" # produces the same output as above
What did work however was to remove the brackets within the instruction
rule:
instruction: pneumonic parameter*
which resolves the ambiguity:
program
instruction
pneumonic PNEUMONIC
parameter "text"
parameter 10
From what I understand the brackets indicate an "expected value" and the parser supplies None
when nothing is found, but what is the parser actually matching in-between the pneumonic and the first parameter in this case?