lark icon indicating copy to clipboard operation
lark copied to clipboard

Resolution order changed

Open Conchylicultor opened this issue 1 year ago • 2 comments
trafficstars

Our project was working well in past version, but seems to be broken with the current lark version. Reproduction:

import lark


GRAMMAR = """
?start: shape

shape: (dim (" " dim)*)?

// TODO: Add expressions (+, /, *, -)
?dim: UNKNOWN_DIM
    | ELLIPSIS_DIM
    | named_dim
    | STATIC_DIM
    | var_dim

var_dim: "*" CNAME
UNKNOWN_DIM: "_"
ELLIPSIS_DIM: "..."
named_dim: CNAME
STATIC_DIM: INT

// Defined in `lark/grammars/common.lark`
%import common.CNAME
%import common.INT
"""

parser = lark.Lark(GRAMMAR)
print(parser.parse('_ 3 n'))
  • Expected result (in 0.12.0): UNKNOWN_DIM, STATIC_DIM, named_dim('n')
  • Actual result (in 1.2.2): named_dim('_'), STATIC_DIM, named_dim('n')

As you can see, _ is now parsed as named_dim, rather than UNKNOWN_DIM. But the grammar define UNKNOWN_DIM before named_dim so I would expect the resolution order to match the code.

We tried with ambiguity='resolve' but this didn't changed anything.

Why did the resolution order changed ? Is there a param to fix the issue ?

Conchylicultor avatar Aug 21 '24 15:08 Conchylicultor

Hello @Conchylicultor ,

Can you please check if this PR fixes your issue?

https://github.com/lark-parser/lark/pull/1451

erezsh avatar Aug 21 '24 16:08 erezsh

To your question -

Why did the resolution order changed ?

Hard to say, but I imagine it probably happened between 0.12.x and 1.0.0

We made a lot of improvements to the Earley parser (and we still do), and it's possible that the order of the derivations change. (though we try to keep that to a minimum)

Is there a param to fix the issue ?

Usually, using a priority is the easiest way to choose between derivations. ( .e.g. preferred_rule.100: subrule1 subrule2 .. )

Also consider using ambiguity='explicit' and choosing the correct derivations on your side.

erezsh avatar Aug 21 '24 16:08 erezsh