markdown-tm-language
markdown-tm-language copied to clipboard
Syntax highlight breaks inside CommonMark list syntax (`1.`s) when surrounding render-unaffective indentation is not 0.
Per the downstream report: ^1
Describe the bug
Source
When nested lists are represented in source format:
# Addition and Subtraction of Binary 1. ## Questions 1. 1. ## Comprehension...they're not syntax-highlighted consistently:
Rendered
However, it's valid; it even renders:
Expected behaviour
The
#and##should remain highlighted as<h[1-2]>s, and the1.s should remain highlighted as<li>s.Related discussion
Additional notes
@RedCMD
lists indented with tabs are currently broken in this grammar
1. # no-indent
1. # Spaces
2. # no-indent
1. # Tab
-
no-indent
-
Spaces
-
-
no-indent
-
Tab
-
https://github.com/wooorm/markdown-tm-language/issues/13#issuecomment-2848862087
@RedCMD, I didn't actually know that a tab could replace a space after a heading designator. Can a tab replace a space anywhere in CommonMark?
both github and vscode seem to support so I updated the above comment
PR welcome, but I would strongly recommend against using hard tabs in markdown. Tabs are good when whitespace does not matter. Tabs do not work well when whitespace does matter. And markdown is a whitespace sensitive language. Markdown has a hardcoded tab size of 4. The whole point of tabs is for it to be different than a hardcoded value.
This may also be impossible with textmate grammars. I have super serious doubts that regexes can match the logic that is needed for markdown here.
Markdown has a hardcoded tab size of 4. The whole point of tabs is for it to be different than a hardcoded value.
@wooorm, that differs. Sometimes the length is significantly longer:
1. This is a valid 3-space list, as is: [^citation_name]
* This 2-space cutie. [^small]
Hey!
[^citation_name]: This is the first line.
This is the second line.
[^small]: This is the first line.
This is the second line.
It is in complex markup when the tab demonstrates its worth, since I can use one tab for all of these situations, and it's entirely valid markup.
@wooorm https://github.com/microsoft/vscode-markdown-tm-grammar handles it correctly
is it not as simple as replacing => [ \t]?
vs Github:
1. # no-indent
1. # Spaces
2. # no-indent
1. # Tab
It is in complex markup when the tab demonstrates its worth, since I can use one tab for all of these situations, and it's entirely valid markup.
Your examples have no tabs. I do not understand them. Half of it is footnotes, which is a GFM feature very different from lists. Please read the markdown spec 2.2 on tabs: https://spec.commonmark.org/0.31.2/#tabs. Please also see 5.2 list items: https://spec.commonmark.org/0.31.2/#list-items. It is very complex.
handles it correctly
It handles this example “correctly” because it handles many cases incorrectly.
is it not as simple as replacing
=>[ \t]?
No, it very much is not that. See https://github.com/wooorm/markdown-tm-language/blob/c78b1e5df644d24fa76716bbe26f4b48a6fc1610/grammar.yml#L863 and the many lines under it.
Your examples have no tabs. I do not understand them.
@wooorm, with tabs, they would be:
1. This is a valid 3-space list, as is: [^citation_name]
* This 2-space cutie. [^small]
Hey!
[^citation_name]: This is the first line.
This is the second line.
[^small]: This is the first line.
This is the second line.
...rendered as:
It handles this example “correctly” because it handles many cases incorrectly.
Does this situation directly relate to those unstated examples?
Thanks for providing an example with tabs. Though, still, halve of it is about footnotes, which are different, unrelated to this issue. Please always removing every unrelated character from example cases. Secondly, I did already provide all the sources for you should stop using tabs, and this cannot be implemented correctly. But I will try and walk you through them.
It would be good to look at what that first tab means: how “big” is it? That can be visualized as such:
1. a
b
c
d
e
f
g
h
i
j
Yields:
-
a
b cd
e
f
g
h
i
j
Note that b becomes indented code (because 8 spaces); g/h/i/j become paragraphs (because less than 4).
Now, I ask you to change that one tab with spaces. Try 1 space. Try 2, 3, 4, 5 spaces.
Also try with a tab but spaces before the 1.. What happens then?
Importantly, also try the the different syntax highlighters.
I hope this gives you a better mental model of the complexity of the whitespace-sensitive markdown parser, and the magic value of 4.
are you saying that Markdown always treats tabs completely interchangeable with 4 spaces?
so you can mix space tab space with tab space space and space 6x etc?
https://github.com/wooorm/markdown-tm-language/issues/13#issuecomment-2856229462
@RedCMD, it should treat a tab as interchangeable with 4 spaces, and has in my experience.
https://github.com/wooorm/markdown-tm-language/issues/13#issuecomment-2855381232
@wooorm, I am thankful for the effort, although I can't say that I understand those examples. Since a tab should always correspond to the default indentation width (4 spaces), its width should depend upon the context. If looking at a list of 1.s, the user would set it to 3 em.