lark
lark copied to clipboard
Lark grammar?
Are there any plans to create a Lark grammar for Lark? The only file currently in lark/grammars is common.lark, which defines a few terminals.
The authoritative definition of Lark would logically be expressed as a grammar (as opposed to "what the grammar reference says" or "what the latest version of the code parses"). Note that RFC 5234 defines the ABNF grammar that authoritatively validates ABNF specifications.
There is a reference-quality grammar in ./examples: https://github.com/lark-parser/lark/blob/master/examples/lark.lark
Do you think there is a benefit to making it formal?
Yes, I think there should be a formal lark-grammar for lark. Mainly, so that I can apply it to the intellij-plugin I am developing and the theoretically planed linter I think about developing.
Well, you can already use the grammar I linked to. It's correct and fairly comprehensive.
There are some things in lark.lark
that don't match what load_grammar.py
implements. For example, I learned the hard way that load_grammar
only allows rule aliases on top-level alternates, although lark.lark
says they're valid on alternates at any depth. Another example is that lark.lark
says "%ignore" expansions
, but load_grammar
requires that expansions
be derived as expansions
-> expansion
-> value
-> Terminal.
You're welcome to submit a PR that fixes lark.lark
Sounds good. I wasn't sure which version was definitive. So, basically the loader is correct and the grammar should reflect what it does, right?
While I'm at it, I'll take a look at the grammar reference documentation and make sure it agrees with the loader code.
Yes, load_grammar.py is the "ground truth", for better or worse.
PR #1388 submitted for my observations above, plus several others I found.