Lerche.jl
Lerche.jl copied to clipboard
union of parsers
Feature request: I have several Lerche parsers that read various file formats and output an object through a transformer for each parser. Now I'd like to have another parser that is the union of those parser, that just unions the grammars and essentially figures out from the file+grammar which format is given, and then runs the corresponding transformer to get the right parsed object.
Sounds like a useful idea.
What do you think of the following notional interface:
u = LercheCombine((grammar1,grammar2,grammar3),(transformer1,transformer2,transformer3))
result = Lerche.parse(u,"my_random_file")
In terms of implementation it would be trying each grammar in order until it found one that didn't cause a parse error. It would only take a few lines of code to create the LercheCombine
type and then a LercheCombine
-specific dispatch for Lerche.parse
that simply did a Lerche.parse
for each of the grammars in the LercheCombine
object inside a try
/catch
block. Feel free @guyvdbroeck to submit a pull request on those lines if you wish, as I might not get to this for a while.
Note I'm using Lerche
as the prefix in LercheCombine
to avoid polluting the Lark
namespace in case something similar is added to Lark
.
I'd just like to point out that this is possible in Lark-Python using the %import
statement.
It won't compile for every combination of grammars, but if it does, it will work as a single parser, which means better performance than trying them one at a time, and with better chances of successful parsing.
Here is an example showing how: https://github.com/lark-parser/lark/tree/master/examples/composition
In which case there is a very good chance it will work like this in Lerche also. I'm assuming it will only fail if more than one grammar matches start
and it can't be disambiguated. Is this sufficient for you @guyvdbroeck?
TODO: create a similar example for Lerche.
It should be possible in Lerche.
Ambiguity is indeed the main worry here, but similar rule names are easy to solve: https://github.com/lark-parser/lark/blob/master/examples/composition/storage.lark
Thanks all. I would also like to avoid having to copy the entire transformer and its methods (and avoid more precompilation overhead), which I suppose is a problem specific to Lerche, not Lark?
I'm not sure why there would be any extra compiling in the LercheCombine
outline I gave above. Each of the transformer methods is still a separate top-level function which would only be compiled the first time it is called with a particular set of argument types, and at the point it is called the argument types are exactly those it receives now. Or were you referring to the approach already available in Lark?
Oh I like LercheCombine
, my comment was more about the approach of importing the various grammars in the BNF itself.
My impression is that most of the pre-compilation time is spent compiling the Julia code, rather than running it, and once that's accomplished, the size of the grammar doesn't matter that much. But I suppose one can never be sure until one benchmarks.