Grammar-Kit icon indicating copy to clipboard operation
Grammar-Kit copied to clipboard

Highlight escape sequences in tokens strings

Open rillig opened this issue 5 years ago • 1 comments
trafficstars

I'm a beginner to Grammar-Kit, trying to produce a grammar for BSD-style Makefiles. To get used to Grammar-Kit, I started with this grammar:

{
    parserClass="org.pkgsrc.intellij.mk.parser.BsdMakefileParser"
    extends="com.intellij.extapi.psi.ASTWrapperPsiElement"

    psiClassPrefix="BsdMk"
    psiImplClassSuffix="Impl"
    psiPackage="org.pkgsrc.intellij.mk.psi"
    psiImplPackage="org.pkgsrc.intellij.mk.psi.impl"

    elementTypeHolderClass="org.pkgsrc.intellij.mk.psi.BsdMakefileTypes"
    elementTypeClass="org.pkgsrc.intellij.mk.psi.BsdMakefileElementType"
    tokenTypeClass="org.pkgsrc.intellij.mk.psi.BsdMakefileTokenType"

    tokens=[
        T_NL = "\n"
        T_COMMENT = 'regexp:[ \t\w]+'
    ]
}

file ::= line*

line ::=
  empty_line
| comment_line

empty_line ::=
  T_NL

comment_line ::=
  comment T_NL

comment ::= "#" T_COMMENT

Surprisingly to me, the T_NL token definition doesn't match a newline.

From the other grammar examples I looked at, I concluded that the tokens would just be ordinary string literals with the usual escape sequences. I saw regular expressions with the typical double-backslashes, therefore I assumed that single backslashes would either just work as in Java or Kotlin, or would produce visible syntax errors in the BNF editor. ("unknown escape sequence")

When I replaced the simple "\n" with "regexp:\n", it worked. I had expected more help from the BNF editor here by highlighting the regexp: part (since I had tried regex: first) and by using the Language Injection for regular expressions.

I also wonder why in the regular expression for T_COMMENT I can simply write \w instead of \\w. From the Grammar.bnf file I had not expected this.

rillig avatar Jan 14 '20 19:01 rillig

GK passes regexp fragments further AS IS and saves on \\\\ here.

gregsh avatar Nov 10 '20 18:11 gregsh