lark
                                
                                 lark copied to clipboard
                                
                                    lark copied to clipboard
                            
                            
                            
                        Inline rules expand ambiguities unnecessarily
Thanks for fixing #1214 earlier. It now seems to return all ambiguities, but ambiguities inside inline rules are expanded within the parser rather than leaving it to CollapseAmbiguities transformer.
I have some input strings that explode very quickly when expanding ambiguities. I parse them with ambiguity=explicit, then count the expected number of expanded ambiguities. If it's within a reasonable threshold, I put the tree through CollapseAmbituities. Otherwise, the processing stops (until I find a better way to deal with such strings). As the result, I still can't use inline rules because parsing of some inputs explodes before I can even count the number of ambiguities.
To reproduce, parse this string:
t = parser.parse('1 2 3')
print(t.pretty())
with this grammar
start: field+
field: f1 | f2
f1: INT
f2: INT
%ignore WS
%import common.WS
%import common.INT
Result:
start
  _ambig
    field
      f1	1
    field
      f2	1
  _ambig
    field
      f1	2
    field
      f2	2
  _ambig
    field
      f1	3
    field
      f2	3
Now, inline field:
start: _field+
_field: f1 | f2
f1: INT
f2: INT
%ignore WS
%import common.WS
%import common.INT
The parser now expands the nested ambiguities prematurely:
_ambig
  start
    f1	1
    f1	2
    f1	3
  start
    f1	1
    f1	2
    f2	3
  start
    f1	1
    f2	2
    f1	3
  start
    f1	1
    f2	2
    f2	3
  start
    f2	1
    f1	2
    f1	3
  start
    f2	1
    f1	2
    f2	3
  start
    f2	1
    f2	2
    f1	3
  start
    f2	1
    f2	2
    f2	3
@chanicpanic The saga continues!
Sorry, messed up the original examples, corrected now.
The example provided is actually a special case that may be open to optimization. Consider the general case in which expansions of field may contain more than one rule:
start: field+
field: f1 ws | f2 ws2
f1: INT
f2: INT
ws: WS
ws2: WS
%import common.WS
%import common.INT
Parsing the text "1 2 3 " produces:
start
  _ambig
    field
      f1        1
      ws
    field
      f2        1
      ws2
  _ambig
    field
      f1        2
      ws
    field
      f2        2
      ws2
  _ambig
    field
      f1        3
      ws
    field
      f2        3
      ws2
If we inline field in this tree, we get:
start
  _ambig
    f1        1
    ws
    f2        1
    ws2
  _ambig
    f1        2
    ws
    f2        2
    ws2
  _ambig
    f1        3
    ws
    f2        3
    ws2
which is not correct. In this case, I see no more compact way to represent the ambiguities than to expand them all.
We may be able to optimize the special case in which all instances of _field have a single child. However, it is a tradeoff. It would add complexity during tree building for a smaller tree size. On the other hand, expanding the tree is work that may be done later anyways when collapsing ambiguities.
@yurymann Conditional inlining may be your friend for highly ambiguous rules:
start: field+
?field: f1 | f2
f1: INT
f2: INT
%ignore WS
%import common.WS
%import common.INT
gives:
start
  _ambig
    f1  1
    f2  1
  _ambig
    f1  2
    f2  2
  _ambig
    f1  3
    f2  3
You may also want to look at lark-ambig-tools which can help you process highly ambiguous trees more efficiently.
@chanicpanic, wow, conditional inlining looks indeed like THE solution for my case.
And thanks for the lark-ambi-tools link! Disambiguator is particularly a thing I was looking for and was almost ready to write myself.