lark icon indicating copy to clipboard operation
lark copied to clipboard

Optimization: Merge anonymous terminals

Open erezsh opened this issue 5 years ago • 3 comments

It's possible to improve Lark's speed, for all algorithms, by performing grammar simplification. Specifically, it's possible to merge adjacent anonymous terminals together, to reduce the parser work load, without affecting the resulting tree (An example of this technique can be seen in https://github.com/lark-parser/lark/issues/218)

In the same vein, ignored terminals can be joined to anonymous ones, with a similar effect.

erezsh avatar Sep 02 '18 07:09 erezsh

I don't thing this is (always) true. There are two different reasons for this: !rule and %ignore:

start: "a" "b"
%ignore " "

will not match the same things as

start: "ab"
%ignore " "

The same goes for !rule:

 !start: "a" "b"

will not return the same tree as:

 !start: "ab"

MegaIng avatar Oct 31 '18 20:10 MegaIng

Well, for !rule, the optimization could be performed, and then reversed in post-processing by parsing the joined string. But that's probably not going to help performance.

As for %ignore, that's pretty simple:

start: "a" "b"
%ignore " "

Becomes

start: _JOINED
_JOINED: "a" " "* "b"
%ignore " "

erezsh avatar Oct 31 '18 20:10 erezsh

On the same note, regarding %ignore and terminals, should this behave like this:

 start: TEST
 TEST: "a" "b"
  %ignore " "

will not match "a b". I know why, but this is a huge difference between terminals and rules. To correct this issue, you would have to already the post-processing splitter inplace. Probably not worth it.

MegaIng avatar Nov 01 '18 01:11 MegaIng