linguist icon indicating copy to clipboard operation
linguist copied to clipboard

Suggestion: Move all grammar-related metadata to `grammars.yml`

Open Alhadis opened this issue 5 years ago • 10 comments

Currently, metadata pertaining to grammars and case-by-case exceptions are handled in four different places:

Moreover, the difficulties related by @pastra98 made me realise there's more we could be doing with regards to locating grammar and license files. Specifically, we should be able to provide a manual path if need be — the currently hardcoded search locations can remain the default for grammars without a src: field defined, or whatever.

Here's how it *might* look.
# Each entry correlates to a directory in "vendor/grammars/#{key}"
--
abl-tmlanguage:
    license: MIT
    source: chriscamicas/abl-tmlanguage
    scopes:
        - source-abl
    
    # Location of files inside submodule repository. Usually
    # calculated automatically, though maybe it makes sense
    # to always provide this list, as opposed to only those
    # grammars with non-standard file locations?
    files:
        grammar: abl.tmLanguage.json
        license: LICENSE

actionscript3-tmbundle:
    license: MIT
    source: simongregory/actionscript3-tmbundle
    scopes:
        - source.actionscript.3
        - text.html.asdoc
        - text.xml.flex-config

c.tmbundle:
    license: MIT
    source: textmate/c.tmbundle
    scopes:
        - source.c
        - source.c++
        - source.c.platform
    aliases:
        source.c++: source.cpp

hy.tmLanguage:
    license: MIT
    source: Slowki/hy.tmLanguage
    scopes:
        - source.hy
    paths:
        - hy.json
        - LICENSE.md

language-roff:
    license: ISC
    source: Alhadis/language-roff
    scopes:
        - hidden.manref
        - source.ditroff
        - source.ditroff.desc
        - source.gremlin
        - source.ideal
        - source.pic
        - text.roff
        - text.runoff

sublimesystemverilog:
    license: MIT
    source: https://bitbucket.org/Clams/sublimesystemverilog/get/default.tar.gz
    scopes:
        - source.systemverilog
        - source.ucfconstraints

Genshi.tmbundle:
    license: MIT
    source: https://svn.edgewall.org/repos/genshi/contrib/textmate/Genshi.tmbundle/Syntaxes/Markup%20Template%20%28XML%29.tmLanguage
    scopes:
        - text.xml.genshi

Thoughts?

EDIT: Oh yeah, it'd also be nice if we refined our terminology a little, because it's confusing to refer to both a TextMate compatible grammar file and the submodule containing it as a "grammar" (maybe "grammar-source" for the latter?) Given most of the grammars I write nowadays are almost exclusively added to language-etc (a super-bundle of whatever I can't be fucked publishing separately anymore, but nothing specific), it'd be a helpful distinction.

Alhadis avatar Sep 02 '20 07:09 Alhadis

That would be pretty cool, actually! Maybe as a flag with the add-grammar script to specify the path?

pastra98 avatar Sep 02 '20 07:09 pastra98

No need. Most of the time, the files are located somewhere predictable. Atom actually forces you to place them in the grammars directory, and limits you to JSON or CSON (a cleaner alternative to JSON) (it also imposes a bunch of other myopic, insane restrictions which I won't go into here).

Alhadis avatar Sep 02 '20 08:09 Alhadis

Ah okay... So it's not that many repos that are structured differently. But I could add a source: field to the grammars.yml manually, if need be?

pastra98 avatar Sep 02 '20 08:09 pastra98

But I could add a source: field to the grammars.yml manually, if need be?

This is more of an RFC to discuss a potential enhancement to Linguist. The changes involved are many, and they touch lots of different components that behave similarly, but operate independently.

In other words, there's nothing you need to worry about WRT your PR. 😉

Alhadis avatar Sep 02 '20 09:09 Alhadis

I get that, I was trying to understand how you envision this to work from a users perspective haha

pastra98 avatar Sep 02 '20 09:09 pastra98

The add-grammar script would support an optional -f/--file switch to specify the location of a grammar file, but that's about it. Usage would be something like --file hy.json or --file ./hy.json or even

--file https://github.com/Slowki/hy.tmLanguage/blob/master/hy.json

since users will try all sorts of things, and it's better for a program to be maximally permissive about what input it accepts, rather than barking at users "to provide a path relative to the upstream repository's root directory".

@lildude I don't think I've overdone that script enough, should I add a man page? 😁

Alhadis avatar Sep 02 '20 09:09 Alhadis

This sounds like an interesting idea, though is likely to be a lot of work. Is all that work really worth the effort for the few corner cases not correctly handled now?

Without thinking too hard about it, a few quick points come to mind:

  • we won't be able to remove the dependency on submodules unless you implement a script to do the downloading and updating... which would be reinventing what git already does which could be more brittle and harder to debug as it would require more specific knowledge.
  • I'm not sure you'll be able to get rid of vendor/licenses/config.yml whilst retaining the required Licensed/Licensee integration, without implementing some more custom code, which I'd prefer we didn't do as we've only just got back to using licensed as it is intended to be used.

So if anything, this change would really be consolidating some of the current grammars.yml and tools/grammars/compiler/data.go info.

Feel free to start a PoC PR and we can track and discuss things as it progresses.

lildude avatar Oct 07 '20 08:10 lildude

Why are grammars.yml and languages.yml in two completely different spots? Wouldn't it make sense to move grammars.yml out of root and into /lib/linguist?

johnmays avatar Oct 22 '23 07:10 johnmays

@johnmays Not really. grammars.yml isn’t actually used directly by Linguist. It’s used to track the external grammars which are used by the syntax highlighting engine which is a completely independent application. The same applies to the vendor/licenses/config.yml - it’s not directly used by linguist and in that case is used to configure licensed, the external library we use for license management.

If we were to move it, it could probably go into vendor/grammars but this would just be making a change for the sake of it and could possibly be a breaking change for other users of this repo expecting the file where it is.

lildude avatar Oct 22 '23 09:10 lildude

I see. That makes sense.

johnmays avatar Dec 18 '23 07:12 johnmays