lark
lark copied to clipboard
ambiguity and whitespace with lalr
Hi, I am trying to understand ambiguity and whitespace in an lalr(1) parser. The first program works as I would expect, but the second fails on input "a ". Could anyone tell me why? By the way, they both work as I would expect under Earley (not lalr(1)) in the Lark IDE https://www.lark-parser.org/ide/. Also, this probably should have been a discussion, not an issue, but I don't think I can move it to "discussions" myself, so if someone else can, that's fine with me. Thanks.
This program works as expected:
from lark import Lark
grammar = '''
start: alternative_1 | alternative_2
!alternative_1: "a" WS
!alternative_2: "a" WS "b" WS
%import common.WS
'''
text = "a " #returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative_1'), [Token('A', 'a'), Token('WS', ' ')])])
#text = "a b " #returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative_2'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b'), Token('WS', ' ')])])
parser = Lark(grammar,parser='lalr')
print(parser.parse(text))
This program fails on input "a ":
from lark import Lark
grammar = """
start: alternative WS
!alternative: "a" | "a" WS "b" // fails on input "a ", works on "a b "
%import common.WS
"""
text = "a "
''' returns
KeyError: '$END'
Expected one of:
* B
'''
#text = "a b " # returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b')]), Token('WS', ' ')])
parser = Lark(grammar,parser='lalr')
print(parser.parse(text))
In your second example, there is a shift/reduce conflict. There is a way to see it:
import logging
from lark import Lark, logger
logger.setLevel(logging.DEBUG)
...
parser = Lark(grammar,parser='lalr', debug=True)
It will output:
Shift/Reduce conflict for terminal WS: (resolving as shift)
* <alternative : A>
By default, Lark chooses shift, same as PLY. You can change it using priority. But better to avoid if you can.
It's mentioned here: https://lark-parser.readthedocs.io/en/latest/how_to_use.html#lalr
Try raw strings:
grammar = r"""
On Sat, Jul 2, 2022, 01:21 gitshub @.***> wrote:
Hello, I am having trouble using whitespace so I made a simple test grammar. My input to the parse is basically either
'a' followed by whitespace or 'a' followed by 'b' followed by whitespace.
When using the Lark IDE https://www.lark-parser.org/ide/ I get results which look like the input, which is what I want. No matter which rule called 'item' I uncomment in the grammar, it works fine.
However, this is not the case when I run Python 3 under Windows 10 (I tried the "IDLE" IDE as well as "Visual Studio Code IDE"). For example, the rule !item: "a" | "a" WS "b" // fails on 'a ', works on 'a b ' with the input 'a b ' gives the desired
Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'item'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b')]), Token('WS', ' ')])
but with the input 'a ' says 'KeyError: '$END' Expected one of:
- B'
Does anyone know what I am doing wrong? Thanks.
#pylint:disable=E0001 from lark import Lark
grammar = """
start: item WS
// !item: "a" | /a\s+b/ // works on 'a ', works on 'a b '
// !item: "a" | "a" /\s+/ "b" // works on 'a ', fails on 'a b '
// !item: "a" | ("a" /\s+/ "b") // works on 'a ', fails on 'a b '
!item: "a" | "a" WS "b" // fails on 'a ', works on 'a b '
%import common.WS // %ignore WS we need to consider whitespace
text = "a " #text = "a b " parser = Lark(grammar,parser='lalr')
print(parser.parse(text))
— Reply to this email directly, view it on GitHub https://github.com/lark-parser/lark/issues/1161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSSSF5UHRFOPMCJCNNHBDVR54N5ANCNFSM52OB3W3Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I managed to rewrite my main application's grammar so that the conflict disappeared, but someday I intend to study LALR parsing so I can actually understand your answer, which I'm sure is very helpful. I found this site https://mdaines.github.io/grammophone/ which looks pretty neat. You can close this if you like, and thanks for your great parsing software.