lark
lark copied to clipboard
Resolution order changed
Our project was working well in past version, but seems to be broken with the current lark version. Reproduction:
import lark
GRAMMAR = """
?start: shape
shape: (dim (" " dim)*)?
// TODO: Add expressions (+, /, *, -)
?dim: UNKNOWN_DIM
| ELLIPSIS_DIM
| named_dim
| STATIC_DIM
| var_dim
var_dim: "*" CNAME
UNKNOWN_DIM: "_"
ELLIPSIS_DIM: "..."
named_dim: CNAME
STATIC_DIM: INT
// Defined in `lark/grammars/common.lark`
%import common.CNAME
%import common.INT
"""
parser = lark.Lark(GRAMMAR)
print(parser.parse('_ 3 n'))
- Expected result (in
0.12.0):UNKNOWN_DIM, STATIC_DIM, named_dim('n') - Actual result (in
1.2.2):named_dim('_'), STATIC_DIM, named_dim('n')
As you can see, _ is now parsed as named_dim, rather than UNKNOWN_DIM. But the grammar define UNKNOWN_DIM before named_dim so I would expect the resolution order to match the code.
We tried with ambiguity='resolve' but this didn't changed anything.
Why did the resolution order changed ? Is there a param to fix the issue ?
Hello @Conchylicultor ,
Can you please check if this PR fixes your issue?
https://github.com/lark-parser/lark/pull/1451
To your question -
Why did the resolution order changed ?
Hard to say, but I imagine it probably happened between 0.12.x and 1.0.0
We made a lot of improvements to the Earley parser (and we still do), and it's possible that the order of the derivations change. (though we try to keep that to a minimum)
Is there a param to fix the issue ?
Usually, using a priority is the easiest way to choose between derivations. ( .e.g. preferred_rule.100: subrule1 subrule2 .. )
Also consider using ambiguity='explicit' and choosing the correct derivations on your side.