commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

example 31 is misplaced and unexplained

Open rsc opened this issue 4 years ago • 4 comments

In 0.30, examples 31-34 are introduced by:

Entity and numeric character references are recognized in any context besides code spans or code blocks, including URLs, link titles, and fenced code block info strings:

and then examples 35-36 are introduced by:

Entity and numeric character references are treated as literal text in code spans and code blocks:

But example 31 is an example of a context where entity and numeric character references are not recognized, namely raw HTML:

<a href="&ouml;&ouml;.html">

The two intros should probably be rewritten to list raw HTML as one of the exceptions:

Entity and numeric character references are recognized in any context besides code spans, code blocks or raw HTML, including URLs, link titles, and fenced code block info strings:

Entity and numeric character references are treated as literal text in code spans, code blocks, and raw HTML:

and then example 31 should be moved after current example 36.

(The argument can be made that they are "recognized" by the eventual HTML parser reading the output, but they are not recognized by CommonMark, or else the output of example 31 would say <a href="öö.html">. Unless CommonMark is saying that ö should be reescaped to &ouml; in output, but that isn't done in examples 32-34.)

rsc avatar Sep 04 '21 15:09 rsc

It's tricky to know what to say here so as not to be confusing. If we say that they aren't recognized in raw HTML, people might think that means that &ouml; in raw HTML will be expanded as &amp;ouml; in HTML rendering -- as happens with &ouml; in code spans. If we say they are recognized, that is also a bit misleading, since really they're just passed through.

jgm avatar Sep 04 '21 16:09 jgm

Indeed. One option would be to reverse the order the two statements and insert a third between them:

Entity and numeric character references are treated as literal text in code spans and code blocks:

(NEW) Entity and numeric character references are passed through unaltered in raw HTML:

Entity and numeric character references are recognized in any other context, including URLs, link titles, and fenced code block info strings:

rsc avatar Sep 04 '21 16:09 rsc

And, assuming example 31 were in the new middle section, another useful example would be something using an HTML entity that commonmark does not allow, such as &copy, which is passed through rather than turned into &amp;copy.

rsc avatar Sep 04 '21 16:09 rsc

I created PR #690 in case it is helpful. No worries if you'd rather do something different.

rsc avatar Sep 04 '21 23:09 rsc