linguist
linguist copied to clipboard
Suggestion: Move all grammar-related metadata to `grammars.yml`
Currently, metadata pertaining to grammars and case-by-case exceptions are handled in four different places:
.gitmodulesspecifies URLs.grammars.ymlspecifies scopes.tools/grammars/compiler/data.gospecifies scope-maps and case-by-case overridesvendor/licenses/config.ymlidentifies (for Licensee) which grammars had their licenses manually-reviewed
Moreover, the difficulties related by @pastra98 made me realise there's more we could be doing with regards to locating grammar and license files. Specifically, we should be able to provide a manual path if need be — the currently hardcoded search locations can remain the default for grammars without a src: field defined, or whatever.
Here's how it *might* look.
# Each entry correlates to a directory in "vendor/grammars/#{key}"
--
abl-tmlanguage:
license: MIT
source: chriscamicas/abl-tmlanguage
scopes:
- source-abl
# Location of files inside submodule repository. Usually
# calculated automatically, though maybe it makes sense
# to always provide this list, as opposed to only those
# grammars with non-standard file locations?
files:
grammar: abl.tmLanguage.json
license: LICENSE
actionscript3-tmbundle:
license: MIT
source: simongregory/actionscript3-tmbundle
scopes:
- source.actionscript.3
- text.html.asdoc
- text.xml.flex-config
c.tmbundle:
license: MIT
source: textmate/c.tmbundle
scopes:
- source.c
- source.c++
- source.c.platform
aliases:
source.c++: source.cpp
hy.tmLanguage:
license: MIT
source: Slowki/hy.tmLanguage
scopes:
- source.hy
paths:
- hy.json
- LICENSE.md
language-roff:
license: ISC
source: Alhadis/language-roff
scopes:
- hidden.manref
- source.ditroff
- source.ditroff.desc
- source.gremlin
- source.ideal
- source.pic
- text.roff
- text.runoff
sublimesystemverilog:
license: MIT
source: https://bitbucket.org/Clams/sublimesystemverilog/get/default.tar.gz
scopes:
- source.systemverilog
- source.ucfconstraints
Genshi.tmbundle:
license: MIT
source: https://svn.edgewall.org/repos/genshi/contrib/textmate/Genshi.tmbundle/Syntaxes/Markup%20Template%20%28XML%29.tmLanguage
scopes:
- text.xml.genshi
Thoughts?
EDIT: Oh yeah, it'd also be nice if we refined our terminology a little, because it's confusing to refer to both a TextMate compatible grammar file and the submodule containing it as a "grammar" (maybe "grammar-source" for the latter?) Given most of the grammars I write nowadays are almost exclusively added to language-etc (a super-bundle of whatever I can't be fucked publishing separately anymore, but nothing specific), it'd be a helpful distinction.
That would be pretty cool, actually! Maybe as a flag with the add-grammar script to specify the path?
No need. Most of the time, the files are located somewhere predictable. Atom actually forces you to place them in the grammars directory, and limits you to JSON or CSON (a cleaner alternative to JSON) (it also imposes a bunch of other myopic, insane restrictions which I won't go into here).
Ah okay... So it's not that many repos that are structured differently. But I could add a source: field to the grammars.yml manually, if need be?
But I could add a
source:field to thegrammars.ymlmanually, if need be?
This is more of an RFC to discuss a potential enhancement to Linguist. The changes involved are many, and they touch lots of different components that behave similarly, but operate independently.
In other words, there's nothing you need to worry about WRT your PR. 😉
I get that, I was trying to understand how you envision this to work from a users perspective haha
The add-grammar script would support an optional -f/--file switch to specify the location of a grammar file, but that's about it. Usage would be something like --file hy.json or --file ./hy.json or even
--file https://github.com/Slowki/hy.tmLanguage/blob/master/hy.json
since users will try all sorts of things, and it's better for a program to be maximally permissive about what input it accepts, rather than barking at users "to provide a path relative to the upstream repository's root directory".
@lildude I don't think I've overdone that script enough, should I add a man page? 😁
This sounds like an interesting idea, though is likely to be a lot of work. Is all that work really worth the effort for the few corner cases not correctly handled now?
Without thinking too hard about it, a few quick points come to mind:
- we won't be able to remove the dependency on submodules unless you implement a script to do the downloading and updating... which would be reinventing what git already does which could be more brittle and harder to debug as it would require more specific knowledge.
- I'm not sure you'll be able to get rid of
vendor/licenses/config.ymlwhilst retaining the required Licensed/Licensee integration, without implementing some more custom code, which I'd prefer we didn't do as we've only just got back to using licensed as it is intended to be used.
So if anything, this change would really be consolidating some of the current grammars.yml and tools/grammars/compiler/data.go info.
Feel free to start a PoC PR and we can track and discuss things as it progresses.
Why are grammars.yml and languages.yml in two completely different spots? Wouldn't it make sense to move grammars.yml out of root and into /lib/linguist?
@johnmays Not really. grammars.yml isn’t actually used directly by Linguist. It’s used to track the external grammars which are used by the syntax highlighting engine which is a completely independent application. The same applies to the vendor/licenses/config.yml - it’s not directly used by linguist and in that case is used to configure licensed, the external library we use for license management.
If we were to move it, it could probably go into vendor/grammars but this would just be making a change for the sake of it and could possibly be a breaking change for other users of this repo expecting the file where it is.
I see. That makes sense.