Textmate grammar highlighting doesn't match GitHub/Linguist when "end" is never matched
This was reported at https://github.com/dart-lang/dart-syntax-highlight/issues/11#issuecomment-1613758553. Dart highlighting on GitHub doesn't handle unterminated triple-backticks as expected. VS Code does handle it as expected.
However, while debugging this, I've become less certain that GitHub is wrong, and feel like VS Code might be.
Here's a trimmed down version of the grammar that shows the problem. It defines triple-slash comments, and supports triple-backtick code blocks inside:
- https://gist.github.com/DanTup/8ca705c94bd50f6d88b2463cbb43eca1#file-dart_syntax-json
It renders like this:
It the triple backticks are unclosed, it looks reasonable:
However, it's not clear why the variable.other.source.dart scope was exited, because the "end" condition was never found. On GitHub, this does not happen and the rest of the document is consumed (note the first void here is red, but the second one is not because the variable context eats the rest of the document):
I can't find anything in the spec for textmate grammars to explain VS Code's behaviour. The most information I've found on it is here:
https://macromates.com/manual/en/language_grammars
The other type of match is the one used by the second rule (lines 9-17). Here two regular expressions are given using the
beginandendkeys. [...] If there is no match for the end pattern, the end of the document is used.
https://www.apeth.com/nonblog/stories/textmatebundle.html
With
begin/end, if the end pattern is not found, the overall match does not fail: rather, once the begin pattern is matched, the overall match runs to theendpattern or to the end of the document, whichever comes first.
While VS Code's behaviour is convenient for me (because I'm not sure how to handle these unclosed triple-backticks if it behaved like GitHub), it doesn't seem correct, and it's more inconvenient if VS Code and GitHub disagree on what the behaviour should be because it makes it more difficult to author a grammar.
Hey @hediet, not sure if this is for you?
I would say that VSCode is correct and Linguist is wrong Linguist seems to just error and self implode
begin/while is a lot more ridgid/restrictive than end
it cannot be 'pushed' out like how end can be
I believe while was designed for embedded languages
that would have otherwise broken out of end
https://github.com/RedCMD/TmLanguage-Syntax-Highlighter/blob/main/documentation/rules.md#while
@RedCMD I'm not sure I understand, the issue here appears to be with begin/end and not while? The first pattern in my example begins with triple backticks and ends with triple backticks, but in my second screenshot there is no second set of triple backticks, so why does the variable.other.source.dart end and not run to the end of the document?
(I'm not complaining, because with the current grammar this is better behaviour than I see on GitHub, but I understand why it would be correct)
the begin/end is inside of the begin/while
while is a lot more strict and won't let the begin/end escape
the while is currently checking every line if it starts with ///
the last line foo, does not start with ///, so the while finishes and all patterns inside are terminated
this was done so embedded languages would stop escaping their bounds and polluting the rest of the document
Oh, I see! That does sound sensible, so maybe the issue here is GitHub then. I've opened an issue at https://github.com/github-linguist/linguist/issues/7015 (assuming that's the right place) for their take on it. Thanks!
Closing because it does not seem to be a VS Code issue.
It is unfortunate that the TextMate language is not specified very precisely.
@hediet seems like you didn't actually close this. However, I've been unable to determine whether VS Code is actually correct here. I got a response at https://github.com/github-linguist/linguist/issues/7015#issuecomment-2321028904 suggesting maybe VS Code is incorrect:
PrettyLights's behaviour appears to be correct here
I don't know how reasonable it is to say one implementation is correct if the spec is vague, however I do think it's in everyones best interest if we can agree on a (the same) definition (in absense of a real spec/documentation). However, this would require some agreement between the big implementations (of which VS Code, and GitHub/PrettyLights are some).
~~after spending way too much time trying to get a stupid Mac to boot~~ I was able to test the grammar in TextMate 2.0
and I can confirm GitHub/Linguist/PrettyLights is correct ✅ and VSCode is wrong ❌
TextMate 2.0 allows the end rule to 'push' the while rule outwards
which allows for interesting things to happen:
In my opinion, this makes
while almost functionally useless
A \G anchor is also placed at the beginning of the nextline after the begin and while
which VSCode does not do ❌
This issue should now be moved to https://github.com/microsoft/vscode-textmate
@RedCMD thanks for digging more into this! If VS Code agrees and will fix this, then I guess I will have to update the Dart grammar (I feel like it's going to get a lot more complicated!).
I also feel like this behaviour isn't ideal, but given TextMate isn't likely to change, I do think it's more important the implementations match.
I've re-raised this at https://github.com/microsoft/vscode-textmate/issues/241