lark
lark copied to clipboard
"Unused terminals" false positive in recursive terminal
Describe the bug
In a grammar where one terminal consists of several other concatenated terminals, this is somehow not counted as a "use" of those recursive terminals. This leads to spurious Unused terminals:
warnings
To Reproduce Install Lark and run the following code:
import lark
import logging
my_grammar = r"""
value: IDENTIFIER
_IDENT_LETTER: "A".."Z"
DECIMAL_DIGIT: "0".."9"
IDENTIFIER: _IDENT_LETTER (_IDENT_LETTER | DECIMAL_DIGIT)+
"""
lark.logger.setLevel(logging.DEBUG)
my_parser = lark.Lark(my_grammar, start="value", parser="lalr", debug=True)
tree = my_parser.parse("E2BIG")
print(f"{tree=} -> pretty:\n{tree.pretty()}")
tree = my_parser.parse("ANSWER42")
print(f"{tree=} -> pretty:\n{tree.pretty()}")
Expected behavior It correctly parses the identifiers and prints it to the console:
tree=Tree(Token('RULE', 'value'), [Token('IDENTIFIER', 'E2BIG')]) -> pretty:
value E2BIG
tree=Tree(Token('RULE', 'value'), [Token('IDENTIFIER', 'ANSWER42')]) -> pretty:
value ANSWER42
Actual behavior It correctly parses the identifiers and prints it to the console AND complains about the terminals being unused:
Unused terminals: ['_IDENT_LETTER', 'DECIMAL_DIGIT']
tree=Tree(Token('RULE', 'value'), [Token('IDENTIFIER', 'E2BIG')]) -> pretty:
value E2BIG
tree=Tree(Token('RULE', 'value'), [Token('IDENTIFIER', 'ANSWER42')]) -> pretty:
value ANSWER42
Additional notes
It does not seem to matter whether IDENT_LETTER
or DECIMAL_DIGIT
begin with an underscore or not. This may or may not contradict what https://raw.githubusercontent.com/lark-parser/lark/master/docs/_static/lark_cheatsheet.pdf says about terminals being "filtered out".