feat: a custom grammar for lexers
While I do prefer XML over YAML, it's also super verbose. This PR is designed to start a discussion about whether to switch to a custom grammar, YAML, or keep XML.
Custom Grammar
Pros:
- Very succinct
- No need to escape regexes
Cons:
- Bespoke syntax that people will have to learn
- No syntax highlighting in editors, no validation beyond parser
config {
name "INI"
aliases "ini", "cfg"
filenames "*.ini", "*.cfg", "*.inf", "*.service", "*.socket", ".gitconfig",
".editorconfig", "pylintrc", ".pylintrc"
mime-types "text/x-ini", "text/inf"
priority 0.1
}
state root {
/\s+/ text
/[;#].*/ commentsingle
/\[.*?\]$/ keyword
/(.*?)([ \t]*)(=)([ \t]*)(.*(?:\n[ \t].+)*)/ by groups
nameattribute, text, operator, text, literalstring
/(.+?)$/ nameattribute
}
YAML Lexer Definitions
Pros:
- More succinct than XML
- Can define a JSON schema and have editors use it to validate.
Cons:
- Fucking YAML
- Indentation is awful
- Less succinct than bespoke syntax
- Will need some way to discriminate between "emitters" and "mutators" when parsing, eg.
type: Keywordvs.type: {bygroups: [...]}
config:
name: "INI"
aliases: ["ini", "cfg"]
filenames: ["*.ini", "*.cfg", "*.inf", "*.service", "*.socket", ".gitconfig",
".editorconfig", "pylintrc", ".pylintrc"]
mime-types: ["text/x-ini", "text/inf"]
priority: 0.1
state:
root:
rule:
- pattern: "\\s+"
type: Text
- pattern: "[;#].*"
type: CommentSingle
- pattern: "\\[.*?\\]"
type: Keyword
- pattern: "(.*?)([ \\t]*)(=)([ \\t]*)(.*(?:\\n[ \\t].+)*)"
type:
bygroups: [NameAttribute, Text, Operator, Text, LiteralString]
- pattern: "(.+?)$"
type: NameAttribute
XML Lexer Definitions
<lexer>
<config>
<name>INI</name>
<alias>ini</alias>
<alias>cfg</alias>
<alias>dosini</alias>
<filename>*.ini</filename>
<filename>*.cfg</filename>
<filename>*.inf</filename>
<filename>*.service</filename>
<filename>*.socket</filename>
<filename>.gitconfig</filename>
<filename>.editorconfig</filename>
<filename>pylintrc</filename>
<filename>.pylintrc</filename>
<mime_type>text/x-ini</mime_type>
<mime_type>text/inf</mime_type>
<priority>0.1</priority> <!-- higher priority than Inform 6 -->
</config>
<rules>
<state name="root">
<rule pattern="\s+">
<token type="Text"/>
</rule>
<rule pattern="[;#].*">
<token type="CommentSingle"/>
</rule>
<rule pattern="\[.*?\]$">
<token type="Keyword"/>
</rule>
<rule pattern="(.*?)([ \t]*)(=)([ \t]*)(.*(?:\n[ \t].+)*)">
<bygroups>
<token type="NameAttribute"/>
<token type="Text"/>
<token type="Operator"/>
<token type="Text"/>
<token type="LiteralString"/>
</bygroups>
</rule>
<rule pattern="(.+?)$">
<token type="NameAttribute"/>
</rule>
</state>
</rules>
</lexer>
Cons:
Fucking YAML
- Indentation is awful
👆👆👆👆👆
XML is clear, well-structured, and easy to format. Yeh, it's verbose, and probably trash to write when you're deep in syntax, but I think the positives outweigh the negatives.
I would never have thought to use xml, but I think it's fitting for this purpose.
I only just stumbled across this project and wanted to say thanks 😁🤘
@davesavic hehe thanks for the comment 😂
I think the custom grammar you proposed would be good. Even if editors don't have highlighting explicitly for his syntax, using existing highlighting made for other languages will do the job before someone makes dedicated extensions/addons. Even if quite cumbersome to me, I surprisingly find XML easier to read than YAML, especially when nested and deep. So basically, my opinion is that XML is very decent, but I still prefer a custom syntax. And no YAML.
@Chi-Iroh I think that's basically how I feel too. Also even with XML, as we don't have a schema there's no completion or anything, so it's no worse in that regard.
I decided not to go with this in the end, mainly because it pulls in Participle, which uses reflection quite heavily, and I want to keep chroma fairly light.