asciidoctor-vscode icon indicating copy to clipboard operation
asciidoctor-vscode copied to clipboard

Grammar rule won't match block-attribute-list that contains an inline macro

Open andrewcarver opened this issue 2 years ago • 4 comments
trafficstars

There is an unaddressed problem in the TextMate grammar that was brought over from Atom in Feb. 2020 (see file history for the grammar-file, Asciidoctor.json)).

As it stands, the regex in the #quote-paragraph rule will not match any block-attribute-list that contains, itself, an inline macro -- such as a URL macro or a bibliographic-citation macro. Downstream from that rule, there are weaknesses also in the #block-attribute-inner rule. The regex-fixes I'm going to show in a PR here (in just a minute) are the upshot of an earlier discussion of the whole matter, still viewable in Atom PR 197 -- where also there are screen-shots and syntax-examples; please see this earlier discussion for details.

(Sadly, that discussion was started almost 2 years after the grammar was ported here from Atom... Too bad I was barking up the wrong tree... Atom instead of VS Code! :-|)

This problem remains as of v. 3.1.3. Implementing them in that version, I've tested these fixes that I'm going to submit in my PR. The system information is:

Version: 1.79.2 (system setup) Commit: 695af097c7bd098fbf017ce3ac85e09bbc5dda06 Date: 2023-06-14T08:57:04.379Z Electron: 22.5.7 Chromium: 108.0.5359.215 Node.js: 16.17.1 V8: 10.8.168.25-electron.0 OS: Windows_NT x64 10.0.19045

andrewcarver avatar Jul 25 '23 16:07 andrewcarver

I can put both the problem that this fixes, and how it fixes it, in a nutshell:

The original regexes did not support block-attribute-list's that contain, themselves, an inline macro: this is because the "meat" of each of these regex's matches is from a character-group that dismisses the possibility of a literal ']' (i.e., \]) lying WITHIN the attribute-list: either this one: [^,\\]]+ or this one: [^\\],.#%]+

The fix for this is to make this character group be only the first of a pair of alternatives -- the latter of which ALLOWS a literal ']' but only if "looking ahead" from it reveals that it is NOT the end of that line of text -- and thus, that it is not the CLOSING ']' of the whole block-attribute-list: either (?:[^,\\]]|\\](?=[ \\t]*\\S))+ or (?:[^\\],.#%]|\\](?=[ \\t]*\\S))+

And the reason that THREE regexes had to be changed, rather than one, to fix this problem is:

  1. the regex on line 1713 doesn't match any text: it's just one large LOOK-AHEAD! (It leaves it to the regex on line 1716 to match the text.)
  2. the regex on line 340 is still looking to match the same text that line 1716's regex matched already; because it is invoked by the "captures" dictionary that starts on line 1717.

(Although https://macromates.com/manual/en/language_grammars says that "captures" dictionaries can "currently" only assign a name to a captured group's matched text, that document covers only TextMate 1.5.1; and from the discussion in https://www.apeth.com/nonblog/stories/textmatebundle.html (see about halfway down), we may surmise that Texmate 2 allows what we see in our own grammar starting at line 1717 -- viz., a "captures" dictionary that contains captures that assign not (or not only) a name, but (also) a "patterns" array -- to look for yet more matches within the same, already-matched text!)

andrewcarver avatar Jul 27 '23 14:07 andrewcarver

Hey!

Thank you for taking the time to explain this issue in detail. In a nutshell, I'm not convinced that the TextMate grammar is suitable for highlighting AsciiDoc text. The work on the AsciiDoc specification has revealed all the ambiguities and limitations of a grammar like TextMate.

I am not opposed to merging this improvement, but I believe it would be better to work on a semantic token provider that will allow for a much more precise and accurate syntax highlighting.

https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-provider

I've created a meta issue (pinned) because there are a lot of cases where the TextMate grammar produces wrong highlighting.

ggrossetie avatar Jul 27 '23 15:07 ggrossetie

Ah! Quite interesting. I will try to be more persevering in my perusal of that page (it didn't initially connect with my brain too wll, so I had dropped it :-/

Since you relate it to "the work on the AsciiDoc specification", I wonder whether this (already marching forward?) work you propose depends on that spec-work? or is it independent?

andrewcarver avatar Jul 27 '23 15:07 andrewcarver

I won't say that we have to wait for a complete specification to start working on a semantic token provider.

If you are interested, you can join: https://chat.asciidoc.org and/or take a look at https://github.com/opendevise/asciidoc-parsing-lab/tree/main

ggrossetie avatar Jul 27 '23 16:07 ggrossetie