langium icon indicating copy to clipboard operation
langium copied to clipboard

Indentation tokenization based on terminal annotations

Open msujew opened this issue 2 years ago • 4 comments

Inspired by https://github.com/langium/langium/discussions/1015

We could offer to build INDENT/DEDENT tokens in the DefaultTokenProvider based on comment annotations on terminals:

/**
 * @token.indent
 */
terminal INDENT: 'indent';
/**
 * @token.dedent
 */
terminal DEDENT: 'dedent';

msujew avatar Apr 10 '23 13:04 msujew

I'm against encoding syntactically relevant information into comments. We can extend the grammar language syntax to make things explicit.

spoenemann avatar Apr 13 '23 21:04 spoenemann

We are looking into writing a grammar for a YAML based language, which means the indentation is meaningful and can't be set to hidden. We can't find examples of how to handle the indentation.

Now I stumble across this issue about INDENT and DEDENT tokens, which in for example xtext seems to be the solution for handling indentations correctly.

Is this functionality on any backlog? Or can we help getting it in Langium?

harmen-xb avatar May 15 '23 05:05 harmen-xb

I understand Langium uses chevrotain for parsing, I guess the information in the following issue is relevant for this issue: https://github.com/Chevrotain/chevrotain/issues/846.

harmen-xb avatar May 15 '23 06:05 harmen-xb

@harmen-xb We don't have a dedicated backlog, only a roadmap (which is out-of-date by now). All open issues are up for grabs, as long as we haven't assigned them yet.

I would think we could have an IndentationAwareTokenBuilder that allows to specify the indent and dedent token names in order to provide that functionality to devs. Any contribution is appreciated!

msujew avatar May 15 '23 07:05 msujew