asciidoctor-vscode
asciidoctor-vscode copied to clipboard
Grammar rule won't match block-attribute-list that contains an inline macro
There is an unaddressed problem in the TextMate grammar that was brought over from Atom in Feb. 2020 (see file history for the grammar-file, Asciidoctor.json)).
As it stands, the regex in the #quote-paragraph rule will not match any block-attribute-list that contains, itself, an inline macro -- such as a URL macro or a bibliographic-citation macro. Downstream from that rule, there are weaknesses also in the #block-attribute-inner rule. The regex-fixes I'm going to show in a PR here (in just a minute) are the upshot of an earlier discussion of the whole matter, still viewable in Atom PR 197 -- where also there are screen-shots and syntax-examples; please see this earlier discussion for details.
(Sadly, that discussion was started almost 2 years after the grammar was ported here from Atom... Too bad I was barking up the wrong tree... Atom instead of VS Code! :-|)
This problem remains as of v. 3.1.3. Implementing them in that version, I've tested these fixes that I'm going to submit in my PR. The system information is:
Version: 1.79.2 (system setup) Commit: 695af097c7bd098fbf017ce3ac85e09bbc5dda06 Date: 2023-06-14T08:57:04.379Z Electron: 22.5.7 Chromium: 108.0.5359.215 Node.js: 16.17.1 V8: 10.8.168.25-electron.0 OS: Windows_NT x64 10.0.19045
I can put both the problem that this fixes, and how it fixes it, in a nutshell:
The original regexes did not support block-attribute-list's that contain, themselves, an inline macro:
this is because the "meat" of each of these regex's matches is from a character-group that dismisses the possibility of a literal ']' (i.e., \]) lying WITHIN the attribute-list:
either this one:
[^,\\]]+
or this one:
[^\\],.#%]+
The fix for this is to make this character group be only the first of a pair of alternatives -- the latter of which ALLOWS a literal ']' but only if "looking ahead" from it reveals that it is NOT the end of that line of text -- and thus, that it is not the CLOSING ']' of the whole block-attribute-list:
either
(?:[^,\\]]|\\](?=[ \\t]*\\S))+
or
(?:[^\\],.#%]|\\](?=[ \\t]*\\S))+
And the reason that THREE regexes had to be changed, rather than one, to fix this problem is:
- the regex on line 1713 doesn't match any text: it's just one large LOOK-AHEAD! (It leaves it to the regex on line 1716 to match the text.)
- the regex on line 340 is still looking to match the same text that line 1716's regex matched already; because it is invoked by the "captures" dictionary that starts on line 1717.
(Although https://macromates.com/manual/en/language_grammars says that "captures" dictionaries can "currently" only assign a name to a captured group's matched text, that document covers only TextMate 1.5.1; and from the discussion in https://www.apeth.com/nonblog/stories/textmatebundle.html (see about halfway down), we may surmise that Texmate 2 allows what we see in our own grammar starting at line 1717 -- viz., a "captures" dictionary that contains captures that assign not (or not only) a name, but (also) a "patterns" array -- to look for yet more matches within the same, already-matched text!)
Hey!
Thank you for taking the time to explain this issue in detail. In a nutshell, I'm not convinced that the TextMate grammar is suitable for highlighting AsciiDoc text. The work on the AsciiDoc specification has revealed all the ambiguities and limitations of a grammar like TextMate.
I am not opposed to merging this improvement, but I believe it would be better to work on a semantic token provider that will allow for a much more precise and accurate syntax highlighting.
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-provider
I've created a meta issue (pinned) because there are a lot of cases where the TextMate grammar produces wrong highlighting.
Ah! Quite interesting. I will try to be more persevering in my perusal of that page (it didn't initially connect with my brain too wll, so I had dropped it :-/
Since you relate it to "the work on the AsciiDoc specification", I wonder whether this (already marching forward?) work you propose depends on that spec-work? or is it independent?
I won't say that we have to wait for a complete specification to start working on a semantic token provider.
If you are interested, you can join: https://chat.asciidoc.org and/or take a look at https://github.com/opendevise/asciidoc-parsing-lab/tree/main