langium
langium copied to clipboard
Export to EBNF grammar for ISLa together with ISLa constraints
If Langium would support the export to EBNF, we would be able to use other tools as well.
One of these tools would be ISLa - a tool for generating inputs along a EBNF grammar. We could also output constraints about types of some rules.
Other tools out there might also exist, but to be honest, I personally have not seen that much, because implementors normally use their own grammar language. But one of these is a visualizer.
Interesting thought, I have two concerns about this though:
- How do we deal with cross references? Can this only be used for parser fuzzying?
- Our terminal definition language is way more powerful than EBNF. We allow terminals such as
terminal ID: /\p{L}*/u
which cannot be represented in EBNF./\w*/
might already be too complex. Also, terminals can be created through theTokenBuilder
programmatically, without being known to the CLI at all.
- about the cross-references: As far as I understood ISLa, the fuzzing is only about syntax
- BUT: You can add constraints to require that a rule is expanded to a certain integer, like
str.to.int(<NumericLiteral>) > 1024
, or require some length or content. - You can even require that a variable for an imperative language is defined before used (very simplified, I think):
forall <rhs> in <assgn>: exists <assgn> declaration: <rhs>.<var> = declaration.<lhs>.<var>
- we could add a similar constraint for cross-references... might be harder for type-unions
- BUT: You can add constraints to require that a rule is expanded to a certain integer, like
- about the token definitions: good point! So, it is not always possible. We would have to expand the RegExp to characters inside of the EBNF generator.
One last thing that I am worrying about is, that EBNF has no notion of Lexer-Rule nor Hidden tokens? We could simulate that by inserting these Non-Terminals everywhere. But I do not like this path...