chroma icon indicating copy to clipboard operation
chroma copied to clipboard

feat: a custom grammar for lexers

Open alecthomas opened this issue 9 months ago • 2 comments

While I do prefer XML over YAML, it's also super verbose. This PR is designed to start a discussion about whether to switch to a custom grammar, YAML, or keep XML.

Custom Grammar

Pros:

  • Very succinct
  • No need to escape regexes

Cons:

  • Bespoke syntax that people will have to learn
  • No syntax highlighting in editors, no validation beyond parser
config {
  name "INI"
  aliases "ini", "cfg"
  filenames "*.ini", "*.cfg", "*.inf", "*.service", "*.socket", ".gitconfig",
            ".editorconfig", "pylintrc", ".pylintrc"
  mime-types "text/x-ini", "text/inf"
  priority 0.1
}

state root {
  /\s+/ text
  /[;#].*/ commentsingle
  /\[.*?\]$/ keyword
  /(.*?)([ \t]*)(=)([ \t]*)(.*(?:\n[ \t].+)*)/ by groups
    nameattribute, text, operator, text, literalstring
  /(.+?)$/ nameattribute
}

YAML Lexer Definitions

Pros:

  • More succinct than XML
  • Can define a JSON schema and have editors use it to validate.

Cons:

  • Fucking YAML
    • Indentation is awful
  • Less succinct than bespoke syntax
  • Will need some way to discriminate between "emitters" and "mutators" when parsing, eg. type: Keyword vs. type: {bygroups: [...]}
config:
  name: "INI"
  aliases: ["ini", "cfg"]
  filenames: ["*.ini", "*.cfg", "*.inf", "*.service", "*.socket", ".gitconfig",
              ".editorconfig", "pylintrc", ".pylintrc"]
  mime-types: ["text/x-ini", "text/inf"]
  priority: 0.1
state:
  root:
    rule:
      - pattern: "\\s+"
        type: Text
      - pattern: "[;#].*"
        type: CommentSingle
      - pattern: "\\[.*?\\]"
        type: Keyword
      - pattern: "(.*?)([ \\t]*)(=)([ \\t]*)(.*(?:\\n[ \\t].+)*)"
        type:
          bygroups: [NameAttribute, Text, Operator, Text, LiteralString]
      - pattern: "(.+?)$"
        type: NameAttribute

XML Lexer Definitions

<lexer>
  <config>
    <name>INI</name>
    <alias>ini</alias>
    <alias>cfg</alias>
    <alias>dosini</alias>
    <filename>*.ini</filename>
    <filename>*.cfg</filename>
    <filename>*.inf</filename>
    <filename>*.service</filename>
    <filename>*.socket</filename>
    <filename>.gitconfig</filename>
    <filename>.editorconfig</filename>
    <filename>pylintrc</filename>
    <filename>.pylintrc</filename>
    <mime_type>text/x-ini</mime_type>
    <mime_type>text/inf</mime_type>
    <priority>0.1</priority> <!-- higher priority than Inform 6 -->
  </config>
  <rules>
    <state name="root">
      <rule pattern="\s+">
        <token type="Text"/>
      </rule>
      <rule pattern="[;#].*">
        <token type="CommentSingle"/>
      </rule>
      <rule pattern="\[.*?\]$">
        <token type="Keyword"/>
      </rule>
      <rule pattern="(.*?)([ \t]*)(=)([ \t]*)(.*(?:\n[ \t].+)*)">
        <bygroups>
          <token type="NameAttribute"/>
          <token type="Text"/>
          <token type="Operator"/>
          <token type="Text"/>
          <token type="LiteralString"/>
        </bygroups>
      </rule>
      <rule pattern="(.+?)$">
        <token type="NameAttribute"/>
      </rule>
    </state>
  </rules>
</lexer>

alecthomas avatar Mar 23 '25 00:03 alecthomas

Cons:

  • Fucking YAML

    • Indentation is awful

👆👆👆👆👆

XML is clear, well-structured, and easy to format. Yeh, it's verbose, and probably trash to write when you're deep in syntax, but I think the positives outweigh the negatives.

I would never have thought to use xml, but I think it's fitting for this purpose.

I only just stumbled across this project and wanted to say thanks 😁🤘

davesavic avatar Apr 02 '25 13:04 davesavic

@davesavic hehe thanks for the comment 😂

alecthomas avatar Apr 02 '25 21:04 alecthomas

I think the custom grammar you proposed would be good. Even if editors don't have highlighting explicitly for his syntax, using existing highlighting made for other languages will do the job before someone makes dedicated extensions/addons. Even if quite cumbersome to me, I surprisingly find XML easier to read than YAML, especially when nested and deep. So basically, my opinion is that XML is very decent, but I still prefer a custom syntax. And no YAML.

Chi-Iroh avatar Jun 11 '25 12:06 Chi-Iroh

@Chi-Iroh I think that's basically how I feel too. Also even with XML, as we don't have a schema there's no completion or anything, so it's no worse in that regard.

alecthomas avatar Jun 21 '25 02:06 alecthomas

I decided not to go with this in the end, mainly because it pulls in Participle, which uses reflection quite heavily, and I want to keep chroma fairly light.

alecthomas avatar Jul 03 '25 11:07 alecthomas