tmbundle: Make tmLanguage compatible with PCRE2
👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.
Our grammar compiler has found several problems with your grammar which I thought I'd let you know about.
These regexes have quite a few problems as you can see in the regex101 link after each:
https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L779
https://regex101.com/r/pSG73T/1
https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L108
... and repeated again at:
https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L130
https://regex101.com/r/NlVs41/1
These are the errors our compiler reported:
- Invalid regex in grammar:
text.marko(insyntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\](]|/>|(?<=[^=])>|(?<!(?:...": nothing to repeat (at offset 105)) - Invalid regex in grammar:
text.marko(insyntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104)) - Invalid regex in grammar:
text.marko(insyntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104))
Will look shortly. Thanks for the report!
@lildude from what I can tell the regex functions correctly. The validator you are using is using PCRE2 however my understanding was that the regex's in tmgrammars are intended to be handled by oniguruma. Is this not the case for linguist?
Is this not the case for linguist?
No. GitHub uses PCRE for grammar parsing for performance reasons.