lark icon indicating copy to clipboard operation
lark copied to clipboard

Inline rules expand ambiguities unnecessarily

Open yurymann opened this issue 2 years ago • 4 comments

Thanks for fixing #1214 earlier. It now seems to return all ambiguities, but ambiguities inside inline rules are expanded within the parser rather than leaving it to CollapseAmbiguities transformer.

I have some input strings that explode very quickly when expanding ambiguities. I parse them with ambiguity=explicit, then count the expected number of expanded ambiguities. If it's within a reasonable threshold, I put the tree through CollapseAmbituities. Otherwise, the processing stops (until I find a better way to deal with such strings). As the result, I still can't use inline rules because parsing of some inputs explodes before I can even count the number of ambiguities.

To reproduce, parse this string:

t = parser.parse('1 2 3')
print(t.pretty())

with this grammar

start: field+
field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

Result:

start
  _ambig
    field
      f1	1
    field
      f2	1
  _ambig
    field
      f1	2
    field
      f2	2
  _ambig
    field
      f1	3
    field
      f2	3

Now, inline field:

start: _field+
_field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

The parser now expands the nested ambiguities prematurely:

_ambig
  start
    f1	1
    f1	2
    f1	3
  start
    f1	1
    f1	2
    f2	3
  start
    f1	1
    f2	2
    f1	3
  start
    f1	1
    f2	2
    f2	3
  start
    f2	1
    f1	2
    f1	3
  start
    f2	1
    f1	2
    f2	3
  start
    f2	1
    f2	2
    f1	3
  start
    f2	1
    f2	2
    f2	3

yurymann avatar Nov 28 '22 23:11 yurymann

@chanicpanic The saga continues!

erezsh avatar Nov 28 '22 23:11 erezsh

Sorry, messed up the original examples, corrected now.

yurymann avatar Nov 29 '22 00:11 yurymann

The example provided is actually a special case that may be open to optimization. Consider the general case in which expansions of field may contain more than one rule:

start: field+
field: f1 ws | f2 ws2

f1: INT
f2: INT
ws: WS
ws2: WS

%import common.WS
%import common.INT

Parsing the text "1 2 3 " produces:

start
  _ambig
    field
      f1        1
      ws
    field
      f2        1
      ws2
  _ambig
    field
      f1        2
      ws
    field
      f2        2
      ws2
  _ambig
    field
      f1        3
      ws
    field
      f2        3
      ws2

If we inline field in this tree, we get:

start
  _ambig
    f1        1
    ws
    f2        1
    ws2
  _ambig
    f1        2
    ws
    f2        2
    ws2
  _ambig
    f1        3
    ws
    f2        3
    ws2

which is not correct. In this case, I see no more compact way to represent the ambiguities than to expand them all.

We may be able to optimize the special case in which all instances of _field have a single child. However, it is a tradeoff. It would add complexity during tree building for a smaller tree size. On the other hand, expanding the tree is work that may be done later anyways when collapsing ambiguities.

@yurymann Conditional inlining may be your friend for highly ambiguous rules:

start: field+
?field: f1 | f2

f1: INT
f2: INT

%ignore WS
%import common.WS
%import common.INT

gives:

start
  _ambig
    f1  1
    f2  1
  _ambig
    f1  2
    f2  2
  _ambig
    f1  3
    f2  3

You may also want to look at lark-ambig-tools which can help you process highly ambiguous trees more efficiently.

chanicpanic avatar Nov 29 '22 07:11 chanicpanic

@chanicpanic, wow, conditional inlining looks indeed like THE solution for my case. And thanks for the lark-ambi-tools link! Disambiguator is particularly a thing I was looking for and was almost ready to write myself.

yurymann avatar Nov 29 '22 08:11 yurymann