vscode-textmate icon indicating copy to clipboard operation
vscode-textmate copied to clipboard

Behaviour of begin/end and while patterns do not match TextMate

Open DanTup opened this issue 1 year ago • 3 comments

I originally raised this as https://github.com/microsoft/vscode/issues/189940 but it seems like it should be moved here.

The original report is as follows:


This was reported at https://github.com/dart-lang/dart-syntax-highlight/issues/11#issuecomment-1613758553. Dart highlighting on GitHub doesn't handle unterminated triple-backticks as expected. VS Code does handle it as expected.

However, while debugging this, I've become less certain that GitHub is wrong, and feel like VS Code might be.

Here's a trimmed down version of the grammar that shows the problem. It defines triple-slash comments, and supports triple-backtick code blocks inside:

  • https://gist.github.com/DanTup/8ca705c94bd50f6d88b2463cbb43eca1#file-dart_syntax-json

It renders like this:

image

It the triple backticks are unclosed, it looks reasonable:

image

However, it's not clear why the variable.other.source.dart scope was exited, because the "end" condition was never found. On GitHub, this does not happen and the rest of the document is consumed (note the first void here is red, but the second one is not because the variable context eats the rest of the document):

image

I can't find anything in the spec for textmate grammars to explain VS Code's behaviour. The most information I've found on it is here:

https://macromates.com/manual/en/language_grammars

The other type of match is the one used by the second rule (lines 9-17). Here two regular expressions are given using the begin and end keys. [...] If there is no match for the end pattern, the end of the document is used.

https://www.apeth.com/nonblog/stories/textmatebundle.html

With begin/end, if the end pattern is not found, the overall match does not fail: rather, once the begin pattern is matched, the overall match runs to the end pattern or to the end of the document, whichever comes first.

While VS Code's behaviour is convenient for me (because I'm not sure how to handle these unclosed triple-backticks if it behaved like GitHub), it doesn't seem correct, and it's more inconvenient if VS Code and GitHub disagree on what the behaviour should be because it makes it more difficult to author a grammar.

DanTup avatar Oct 03 '24 10:10 DanTup

There was some back and forth about whether VS Code or GitHub was correct here and I filed https://github.com/github-linguist/linguist/issues/7015 thinking it was a GitHub issue. However, @RedCMD did some more digging and tested with TextMate and confirms it behaves the same as GitHub, and therefore VS Code's behaviour is incorrect:

https://github.com/microsoft/vscode/issues/189940#issuecomment-2323586656

DanTup avatar Oct 03 '24 10:10 DanTup

There are two differences between VSCode TextMate and TextMate2.0

while while is being checked, a \G anchor is placed at the beginning of the next line. VSCode does not do this currently ❌ this should be an easy fix and prob fix a few bug reports as well

VSCode's while is very strict in that it doesn't let begin/end escape, which in my opinion is very good for embedded languages However TextMate2.0 allows while to be pushed out which is I think is terrible as in the example above the middle /// would need to be handled by the embedded rules instead of the parent grammar I'm not sure if it should be fixed, as Markdown heavily relies on VSCode's behaviour

RedCMD avatar Oct 05 '24 19:10 RedCMD

I'm not sure if it should be fixed, as Markdown heavily relies on VSCode's behaviour

While I agree that VS Code's behaviour seems better, I don't think diverging from TextMate and claiming to be a TextMate grammar is great for extension authors or users. It'll either result in bugs and inconsistencies between editors, or require grammar writers to spend time testing grammars against each editor.

But if VS Code does choose to knowingly diverge, these differences should be clearly documented IMO so that grammar authors trying to go in either direction (use a grammar written against VS Code elsewhere, or bring a grammar from elsewhere to VS Code) have some reference of the things to look out for.

DanTup avatar Oct 05 '24 20:10 DanTup