langium
langium copied to clipboard
Indentation tokenization based on terminal annotations
Inspired by https://github.com/langium/langium/discussions/1015
We could offer to build INDENT/DEDENT tokens in the DefaultTokenProvider based on comment annotations on terminals:
/**
* @token.indent
*/
terminal INDENT: 'indent';
/**
* @token.dedent
*/
terminal DEDENT: 'dedent';
I'm against encoding syntactically relevant information into comments. We can extend the grammar language syntax to make things explicit.
We are looking into writing a grammar for a YAML based language, which means the indentation is meaningful and can't be set to hidden. We can't find examples of how to handle the indentation.
Now I stumble across this issue about INDENT and DEDENT tokens, which in for example xtext seems to be the solution for handling indentations correctly.
Is this functionality on any backlog? Or can we help getting it in Langium?
I understand Langium uses chevrotain for parsing, I guess the information in the following issue is relevant for this issue: https://github.com/Chevrotain/chevrotain/issues/846.
@harmen-xb We don't have a dedicated backlog, only a roadmap (which is out-of-date by now). All open issues are up for grabs, as long as we haven't assigned them yet.
I would think we could have an IndentationAwareTokenBuilder that allows to specify the indent and dedent token names in order to provide that functionality to devs. Any contribution is appreciated!