lark icon indicating copy to clipboard operation
lark copied to clipboard

Discrimination between "a" and "aa".

Open tgdcat opened this issue 3 years ago • 2 comments
trafficstars

Thank you for creating Lark.

I want to use Lark to distinguish between "a" and "aa". (I'm sorry if it's an already mentioned issue.)

Run the following sample program.

from lark import Lark

text = 'a aa aaa'

parser = Lark('''
    start:  statement*
    statement: a | aa | aaa
    a: "a"
    aa: "aa"
    aaa: "aaa"

    %import common.WS
    %ignore WS
''', start='start')

tree = parser.parse(text)

The result I want is:

start
  statement
    a
  statement
    aa
  statement
    aaa

But every time I actually do it, the result is different.


# First time
start
  statement
    a
  statement
    a
  statement
    a
  statement
    aa
  statement
    a

# Second time
start
  statement
    a
  statement
    a
  statement
    a
  statement
    aaa

# Third time
start
  statement
    a
  statement
    aa
  statement
    a
  statement
    aa

Python 3.8.7 lark-parser 0.12.0

My approach may be wrong. I would appreciate it if you could tell me the solution.

tgdcat avatar Dec 20 '21 06:12 tgdcat

It's a known issue with the Earley parser, that it isn't always deterministic in choosing a derivation when there's ambiguity.

A straight-forward solution would be do give explicit priority to the longer rules:

    a: "a"
    aa.1: "aa"
    aaa.2: "aaa"

Another way is to force to lexer to expect a whitespace after the string:

    a: /a(?=\s)/
    aa: /aa(?=\s)/
    aaa: /aaa/

(%ignore doesn't force a separation in Earley by design)

Another "fix" would be using lexer="basic", since it removes the ambiguity. You can also do parser="lalr".

Having said all this, I do think ideally your use case should work as-is without any of these fixes. So feel free to leave this open, and maybe we'll get to it some day.

erezsh avatar Dec 20 '21 21:12 erezsh

Thank you for the quick response.

By prioritizing as you say, I was able to get the results I expected. I am grateful for your support.

Having said all this, I do think ideally your use case should work as-is without any of these fixes. So feel free to leave this open, and maybe we'll get to it some day.

Yes. Keep it open until you can resolve the ambiguity without adding anything.

tgdcat avatar Dec 21 '21 07:12 tgdcat