marko-tmbundle icon indicating copy to clipboard operation
marko-tmbundle copied to clipboard

tmbundle: Make tmLanguage compatible with PCRE2

Open lildude opened this issue 3 years ago • 3 comments

👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.

Our grammar compiler has found several problems with your grammar which I thought I'd let you know about.

These regexes have quite a few problems as you can see in the regex101 link after each:

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L779

https://regex101.com/r/pSG73T/1

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L108

... and repeated again at:

https://github.com/marko-js/marko-tmbundle/blob/60ded48bea6d6eccea81e858168e20e0b1b78b01/syntaxes/marko.tmLanguage.json#L130

https://regex101.com/r/NlVs41/1

These are the errors our compiler reported:

  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\](]|/>|(?<=[^=])>|(?<!(?:...": nothing to repeat (at offset 105))
  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104))
  • Invalid regex in grammar: text.marko (in syntaxes/marko.tmLanguage.json) contains a malformed regex (regex "(?=[,;\]]|/>|(?<=[^=])>|(?<!(?:^...": nothing to repeat (at offset 104))

lildude avatar Sep 02 '22 16:09 lildude

Will look shortly. Thanks for the report!

DylanPiercey avatar Sep 02 '22 17:09 DylanPiercey

@lildude from what I can tell the regex functions correctly. The validator you are using is using PCRE2 however my understanding was that the regex's in tmgrammars are intended to be handled by oniguruma. Is this not the case for linguist?

DylanPiercey avatar Sep 02 '22 20:09 DylanPiercey

Is this not the case for linguist?

No. GitHub uses PCRE for grammar parsing for performance reasons.

lildude avatar Sep 03 '22 05:09 lildude