language-tools icon indicating copy to clipboard operation
language-tools copied to clipboard

Update regex of VSCode grammars to be used with `github/linguist`

Open kindoflew opened this issue 3 years ago • 7 comments

As of right now, syntax highlighting is broken on GitHub for anything inside <script>, <style>, and <template> tags in .svelte files.

This started (I believe) with this PR: https://github.com/github/linguist/pull/5963. That points the source for svelte files to https://github.com/sveltejs/svelte-atom. However, there is apparently some bug in the injections section of that grammar that is breaking syntax highlighting (https://github.com/github/linguist/issues/5830#issuecomment-1195653540).

Since Atom is going to be sunsetted, and the VSCode grammars in this repo use TextMate grammar, I was going to open a PR in github/linguist to update the source to point to the grammars here. However, when I attempted to add it to that repo, it failed linguist's compiler check with these errors:

- Invalid regex in grammar: `markdown.svelte.codeblock.script` (in `packages/svelte-vscode/syntaxes/markdown-svelte-js.json`) contains a malformed regex (regex "`(?<=script.*>)(?!</)`": lookbehind assertion is not fixed length (at offset 13))
- Invalid regex in grammar: `svelte.pug.tags` (in `packages/svelte-vscode/syntaxes/pug-svelte-tags.json`) contains a malformed regex (regex "`\b(?<=[\w$\-_]*)\s*?(!=|=)\s*?([`...": group name must start with a non-digit (at offset 56))
- Invalid regex in grammar: `svelte.pug.tags` (in `packages/svelte-vscode/syntaxes/pug-svelte-tags.json`) contains a malformed regex (regex "`\b(?<=[\w$\-_]*)\s*?(!=|=)\s*?([`...": group name must start with a non-digit (at offset 58))
- Invalid regex in grammar: `svelte.pug.tags` (in `packages/svelte-vscode/syntaxes/pug-svelte-tags.json`) contains a malformed regex (regex "`\b(?<=[\w$\-_]*)\s*?(!=|=)\s*?(?`...": lookbehind assertion is not fixed length (at offset 15))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=^\s*?|#\[\s*?|:\s+?)([A-Z][a`...": lookbehind assertion is not fixed length (at offset 9))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=^\s*?|#\[\s*?|:\s+?)([A-Z][a`...": lookbehind assertion is not fixed length (at offset 9))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=^\s*?|#\[\s*?|:\s+?)(\+)(deb`...": group name must start with a non-digit (at offset 104))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=each\s*?\(\s*?')(.*)\s+?(as\`...": lookbehind assertion is not fixed length (at offset 19))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=each\s*?\(\s*?')(.*)\s+?(as\`...": lookbehind assertion is not fixed length (at offset 19))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=await\s*?\(\s*?')(.*)\s+?(th`...": lookbehind assertion is not fixed length (at offset 20))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=debug\s*?\(\s*?')(\w+?)(,|$)`": lookbehind assertion is not fixed length (at offset 20))
- Invalid regex in grammar: `svelte.pug` (in `packages/svelte-vscode/syntaxes/pug-svelte.json`) contains a malformed regex (regex "`(?<=^\s*?|#\[\s*?|:\s+?)(\+)(els`...": lookbehind assertion is not fixed length (at offset 9))
- Invalid regex in grammar: `markdown.svelte.codeblock.style` (in `packages/svelte-vscode/syntaxes/markdown-svelte-css.json`) contains a malformed regex (regex "`(?<=style.*>)(?!</)`": lookbehind assertion is not fixed length (at offset 12))

I admittedly know nothing about TextMate grammars and only basic regex, but I was wondering if these are things that could be changed/fixed so that we can point linguist here to fix syntax highlighting? If not, we can close this. Thanks!


RELATED ISSUES:

  • https://github.com/sveltejs/svelte-atom/issues/20
  • https://github.com/github/linguist/issues/5830
  • https://github.com/community/community/discussions/13004
  • https://github.com/community/community/discussions/20556

kindoflew avatar Jul 28 '22 15:07 kindoflew

TLDR first: our TextMate grammar probably won't fix the current highlight issue with Github soon. And there're multiple reasons for it.

As far as I remembered, I tried it on the sunsetted lightshow syntax highlight playground with our grammar and it doesn't highlight the content of the script and style tags. And this community version nova light show also doesn't work. So the problem probably isn't exclusive to the svelte-atom version.

From our experiences, the injection rule doesn't have good compatibility between different Textmate syntax highlight implementations. As for the regex error, this is a problem between Oniguruma and PCRE regex engines. In both cases, We probably have to either try until it works on both sides or find a tool to help us test this.

There's also another potential problem, Can the Github linguist identify our YAML grammar file? They're probably also non-fixed length-lookbehind-assertion in the main grammar file but it's not on the error list. Looking at their repo I didn't see a place to config which file to use. So there's probably a rule on the file name. And I don't know if we can just rename it. As there might be other repo has a script to fetch this file.

jasonlyu123 avatar Jul 29 '22 06:07 jasonlyu123

FWIW the errors that are reported by linguist's check look very much like what I had to change to convert the grammar from here and make it work in Atom. See this commit.

I only rewrote the regexes in the main grammar, but the changes are pretty basic. The key change is to move joker characters outside of lookbehinds.

@kindoflew Do you have a way to check if those changes would be enough to this those errors for you? And also, you should probably include the main grammar as a JSON file, from the YAML in this project, plus fixed regex in those tests.

rixo avatar Jul 29 '22 07:07 rixo

Sorry it's been awhile, work took over for a second.

There's also another potential problem, Can the Github linguist identify our YAML grammar file? They're probably also non-fixed length-lookbehind-assertion in the main grammar file but it's not on the error list. Looking at their repo I didn't see a place to config which file to use. So there's probably a rule on the file name. And I don't know if we can just rename it. As there might be other repo has a script to fetch this file.

Looking through the linguist repo, I found this file where it seems like we can specify grammar files with non-standard names and ignore certain files: https://github.com/github/linguist/blob/1f65799270b46a0bd1d4a62ea5734a1117f45e61/tools/grammars/compiler/data.go#L46

so that maybe seems promising for pointing it in the direction of wherever we'd want it to.


@kindoflew Do you have a way to check if those changes would be enough to this those errors for you? And also, you should probably include the main grammar as a JSON file, from the YAML in this project, plus fixed regex in those tests.

I tried using the grammar from before that commit in Nova Lightshow and the same problem is happening, so it might be that the regex errors are the least of our problems.


can I ask what the injections are being used for? like I said, I don't know anything about TextMate grammars, so I was trying to research other languages that have similar-ish needs as svelte (single-file components, superset of HTML, embedded JS/CSS/etc) and the only one like that in linguist's list is vue. looking at their grammar, they don't have any injections at all: https://github.com/vuejs/vue-syntax-highlight/blob/master/vue.YAML-tmLanguage.

kindoflew avatar Aug 29 '22 14:08 kindoflew

@Monkatraz probably knows best about this. AFAIK it was to reuse more of the grammar / not having to duplicate every possible iteration of lang="X". The PR adding the injection rules is #657

dummdidumm avatar Aug 29 '22 14:08 dummdidumm

Tried quite some times last week in Nova Lightshow. I am not sure if the problem is the injection or not now. Even if I don't use the injection it still doesn't work. While it works just fine in VSCode. Also, it is not like the injection doesn't work in Github's syntax highlight. PHP uses injection and it seems to work fine. Would be appreciated if someone could find the actual problem.

jasonlyu123 avatar Aug 29 '22 15:08 jasonlyu123

another bump in the road -- i just tried vue's grammar with an example of a vue file in Nova Lightshow and a similar problem is present -- script tags don't contain any highlighting between them (except for comments for some reason. also, style works). but vue files are correctly highlighted on GitHub.

seems like there's probably a bug in that project.

kindoflew avatar Aug 29 '22 16:08 kindoflew

update to the update: it's not a bug, it's a limitation of Nova Lightshow: https://github.com/Nixinova/NovaLightshow/issues/3#issuecomment-1236056446

kindoflew avatar Sep 03 '22 17:09 kindoflew

I'm working on updating the VS Code grammar to make it usable with linguist.

umanghome avatar Oct 27 '22 08:10 umanghome

Hopefully last update: I added a fix to the syntax for injections in svelte-atom (https://github.com/sveltejs/svelte-atom/pull/21) and thanks to @umanghome's fork of NovaLightshow, we have soft confirmation that it works. The maintainer of github/linguist says the next release will be the week of Nov 14 the earliest, so hopefully we get syntax highlighting back before Christmas 😂.

Closing this as it's being solved in other repos. Thanks for the help everyone!

kindoflew avatar Nov 01 '22 17:11 kindoflew