Lerche.jl icon indicating copy to clipboard operation
Lerche.jl copied to clipboard

union of parsers

Open guyvdbroeck opened this issue 3 years ago • 8 comments

Feature request: I have several Lerche parsers that read various file formats and output an object through a transformer for each parser. Now I'd like to have another parser that is the union of those parser, that just unions the grammars and essentially figures out from the file+grammar which format is given, and then runs the corresponding transformer to get the right parsed object.

guyvdbroeck avatar Sep 13 '21 02:09 guyvdbroeck

Sounds like a useful idea.

What do you think of the following notional interface:

u = LercheCombine((grammar1,grammar2,grammar3),(transformer1,transformer2,transformer3))
result = Lerche.parse(u,"my_random_file")

In terms of implementation it would be trying each grammar in order until it found one that didn't cause a parse error. It would only take a few lines of code to create the LercheCombine type and then a LercheCombine-specific dispatch for Lerche.parse that simply did a Lerche.parse for each of the grammars in the LercheCombine object inside a try/catch block. Feel free @guyvdbroeck to submit a pull request on those lines if you wish, as I might not get to this for a while.

Note I'm using Lerche as the prefix in LercheCombine to avoid polluting the Lark namespace in case something similar is added to Lark.

jamesrhester avatar Sep 13 '21 05:09 jamesrhester

I'd just like to point out that this is possible in Lark-Python using the %import statement.

It won't compile for every combination of grammars, but if it does, it will work as a single parser, which means better performance than trying them one at a time, and with better chances of successful parsing.

Here is an example showing how: https://github.com/lark-parser/lark/tree/master/examples/composition

erezsh avatar Sep 13 '21 07:09 erezsh

In which case there is a very good chance it will work like this in Lerche also. I'm assuming it will only fail if more than one grammar matches start and it can't be disambiguated. Is this sufficient for you @guyvdbroeck?

TODO: create a similar example for Lerche.

jamesrhester avatar Sep 13 '21 08:09 jamesrhester

It should be possible in Lerche.

Ambiguity is indeed the main worry here, but similar rule names are easy to solve: https://github.com/lark-parser/lark/blob/master/examples/composition/storage.lark

erezsh avatar Sep 13 '21 08:09 erezsh

Thanks all. I would also like to avoid having to copy the entire transformer and its methods (and avoid more precompilation overhead), which I suppose is a problem specific to Lerche, not Lark?

guyvdbroeck avatar Sep 13 '21 17:09 guyvdbroeck

I'm not sure why there would be any extra compiling in the LercheCombine outline I gave above. Each of the transformer methods is still a separate top-level function which would only be compiled the first time it is called with a particular set of argument types, and at the point it is called the argument types are exactly those it receives now. Or were you referring to the approach already available in Lark?

jamesrhester avatar Sep 14 '21 01:09 jamesrhester

Oh I like LercheCombine, my comment was more about the approach of importing the various grammars in the BNF itself.

guyvdbroeck avatar Sep 14 '21 19:09 guyvdbroeck

My impression is that most of the pre-compilation time is spent compiling the Julia code, rather than running it, and once that's accomplished, the size of the grammar doesn't matter that much. But I suppose one can never be sure until one benchmarks.

erezsh avatar Sep 14 '21 20:09 erezsh