commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Space after opening tilde fence before info string

Open Crissov opened this issue 7 years ago • 5 comments

A code fence is a sequence of at least three consecutive backtick characters (`) or tildes (~). (…)

A fenced code block begins with a code fence, […] The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing whitespace and called the info string. If the [info string] comes after a backtick fence, it may not contain any backtick characters. (The reason for this restriction is that otherwise some inline code would be incorrectly interpreted as the beginning of a fenced code block.)

This prose does not say anything about whether whitespace is required between the backticks or tildes and the info string, just that they are trimmed away eventually. There is an example further down that begins with ```ruby, though. Apparently, whitespace is optional in backtick fences and presumably also in tilde fences.

The rules for backticks inside the info string are there because of the necessary but peculiar rules for inline code spans marked with one or many backticks: They can use any matching number of backticks and may be flanked by whitespace on the inner side, the trimming rules for which are still debated.

Tildes are not used for inline markdown in vanilla Commonmark, but they are used in extensions, usually either for subscripts or strike-through / deletion or for both. Github Flavored Markdown, for instance, supports stricken text with any matching number of tildes before and after (but no whitespace is allowed between the markers and the marked-up content). Although I believe they should limit that to two tildes (like Gitlab does), this makes some lines that start with three or more tildes ambiguous. A simple fix would be to require whitespace before the info string in a tilde code fence.

~~~ambiguous~~~
Code block? 
~~~

Even if a flavor uses exactly two tildes before and after stricken text, it may also use single tildes before and after subscript text, which can be combined for three consecutive tildes.

~~~ambiguous~~~
. 
<del><sub>ambiguous</sub></del>

PS: That ruby example also expects code="language-ruby" in HTML output, which is not clearly marked as merely an informative suggestion.

PPS: I think, for consistency and readability, there should be a should (not must) requirement for heading underlines to also contain at least three uninterrupted equal signs = or hyphens -. Thematic breaks have it, too, although they can be interrupted by spaces.

Crissov avatar Oct 05 '18 09:10 Crissov

That's a good point about the strikeout extensions. I don't know which would be better:

  1. Disallow tildes in info strings after a tilde fence.
  2. Require a space after the tilde fence.

I think I prefer 1. This would make the two kinds of fence entirely symmetrical. But it rules out the pretty

~~~ ruby ~~~~
code
~~~~~~~~~~~~~

Anyone have thoughts about this?

jgm avatar Oct 05 '18 16:10 jgm

Although I proposed and still prefer option 2, I assume that less existing content has problems with option 1. Weaker variants of option 1 would be to disallow a) the exact number or b) more than the number of consecutive tildes as in the fence to appear in the info string.

~~~stricken~~~
no code block
~~~

~~~   not-stricken   ~~~
no code block
~~~

~~~perhaps-stricken~~~~~
code block according to a) but not b)
~~~

~~~not-stricken~~ ~ ~~
code block
~~~

PS: Github's fork of cmark indeed prefers the strike-out over a code block and it does not require the number of tildes before and after stricken text to be the same.

github/cmark#71 github/cmark#99 markdown-it/markdown-it#446


~test~~~~ 1 before, 4 after

~~~test~~~
3 before and after in the line above and 3 in the line below
~~~

Crissov avatar Oct 05 '18 17:10 Crissov

Although GitHub's implementation allows any number of (and unbalanced) tildes for strikethrough, their spec specifies exactly two tildes on each side. That wouldn't pose a problem for the current behavior of tilde code blocks, even with no extra space. In

~~~~lua~~~~

the code fence interpretation would take precedence over the nested strikethrough interpretation, but I think that's fine and appropriate.

Note that there's also an option 3, which is to impose some tighter restriction on info strings that is uniform across types of code blocks. E.g., pandoc allows either a single word or an attribute block of form {.class .class2 id=foo key="value"}.

jgm avatar Oct 06 '18 04:10 jgm

JFTR, the quirk in cmark-gfm has been fixed. github/cmark#120

Crissov avatar Oct 08 '18 00:10 Crissov

JFTR, v0.29 settles the related issue of initial and final whitespace in code spans.

Crissov avatar Apr 11 '19 08:04 Crissov