lark icon indicating copy to clipboard operation
lark copied to clipboard

ambiguity and whitespace with lalr

Open gitshub opened this issue 3 years ago • 1 comments
trafficstars

Hi, I am trying to understand ambiguity and whitespace in an lalr(1) parser. The first program works as I would expect, but the second fails on input "a ". Could anyone tell me why? By the way, they both work as I would expect under Earley (not lalr(1)) in the Lark IDE https://www.lark-parser.org/ide/. Also, this probably should have been a discussion, not an issue, but I don't think I can move it to "discussions" myself, so if someone else can, that's fine with me. Thanks.

This program works as expected:

from lark import Lark

grammar = '''

start:  alternative_1  | alternative_2 

!alternative_1: "a"  WS

!alternative_2: "a"  WS "b" WS

%import common.WS
'''
text = "a "    #returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative_1'), [Token('A', 'a'), Token('WS', ' ')])])
	
#text = "a b " #returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative_2'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b'), Token('WS', ' ')])])

parser = Lark(grammar,parser='lalr')
print(parser.parse(text))

This program fails on input "a ":

from lark import Lark

grammar = """

start: alternative WS

!alternative: "a"  | "a" WS  "b"        // fails on input  "a ",           works on "a b " 

%import common.WS
"""

text = "a " 
''' returns 
				KeyError: '$END'
				Expected one of:
				   * B
'''

#text = "a b " # returns Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'alternative'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b')]), Token('WS', ' ')])

parser = Lark(grammar,parser='lalr')
print(parser.parse(text))

gitshub avatar Jul 01 '22 23:07 gitshub

In your second example, there is a shift/reduce conflict. There is a way to see it:

import logging
from lark import Lark, logger
logger.setLevel(logging.DEBUG)
...
parser = Lark(grammar,parser='lalr', debug=True)

It will output:

Shift/Reduce conflict for terminal WS: (resolving as shift)
 * <alternative : A>

By default, Lark chooses shift, same as PLY. You can change it using priority. But better to avoid if you can.

It's mentioned here: https://lark-parser.readthedocs.io/en/latest/how_to_use.html#lalr

erezsh avatar Jul 22 '22 17:07 erezsh

Try raw strings:

grammar = r"""

On Sat, Jul 2, 2022, 01:21 gitshub @.***> wrote:

Hello, I am having trouble using whitespace so I made a simple test grammar. My input to the parse is basically either

'a' followed by whitespace or 'a' followed by 'b' followed by whitespace.

When using the Lark IDE https://www.lark-parser.org/ide/ I get results which look like the input, which is what I want. No matter which rule called 'item' I uncomment in the grammar, it works fine.

However, this is not the case when I run Python 3 under Windows 10 (I tried the "IDLE" IDE as well as "Visual Studio Code IDE"). For example, the rule !item: "a" | "a" WS "b" // fails on 'a ', works on 'a b ' with the input 'a b ' gives the desired

Tree(Token('RULE', 'start'), [Tree(Token('RULE', 'item'), [Token('A', 'a'), Token('WS', ' '), Token('B', 'b')]), Token('WS', ' ')])

but with the input 'a ' says 'KeyError: '$END' Expected one of:

  • B'

Does anyone know what I am doing wrong? Thanks.

#pylint:disable=E0001 from lark import Lark

grammar = """

start: item WS

// !item: "a" | /a\s+b/ // works on 'a ', works on 'a b '

// !item: "a" | "a" /\s+/ "b" // works on 'a ', fails on 'a b '

// !item: "a" | ("a" /\s+/ "b") // works on 'a ', fails on 'a b '

!item: "a" | "a" WS "b" // fails on 'a ', works on 'a b '

%import common.WS // %ignore WS we need to consider whitespace

text = "a " #text = "a b " parser = Lark(grammar,parser='lalr')

print(parser.parse(text))

— Reply to this email directly, view it on GitHub https://github.com/lark-parser/lark/issues/1161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSSSF5UHRFOPMCJCNNHBDVR54N5ANCNFSM52OB3W3Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

erezsh avatar Oct 11 '22 08:10 erezsh

I managed to rewrite my main application's grammar so that the conflict disappeared, but someday I intend to study LALR parsing so I can actually understand your answer, which I'm sure is very helpful. I found this site https://mdaines.github.io/grammophone/ which looks pretty neat. You can close this if you like, and thanks for your great parsing software.

gitshub avatar Oct 11 '22 19:10 gitshub