commonmark-spec icon indicating copy to clipboard operation
commonmark-spec copied to clipboard

Contents of <pre> tags wrapped in <div> tags are not left alone if they contain empty lines

Open aral opened this issue 3 months ago • 2 comments

Contents of HTML preformatted text blocks (<pre>…</pre>) are, correctly, left alone:

e.g.,

<pre>a

b
</pre>

is rendered correctly as, with the contents untouched:

<pre>a

b
</pre>

However, if you wrap that in a <div>:

<div>
  <pre>a

  b
  </pre>
</div>

the commonmark parser starts parsing the contents of the <pre> tag following the empty line, which corrupts the resulting HTML:

<div>
  <pre>a
<p>b
</pre></p>
</div>

Notice the erroneously added <p> tag prior to b that closes following the </pre> tag.

I encountered this in Markdown-it and was able to narrow it down to this test case which is also reproducible in the commonmark parser at https://spec.commonmark.org/dingus/.

Related:

• https://github.com/markdown-it/markdown-it/issues/238

aral avatar Sep 01 '25 09:09 aral

Do you have a suggestion to make about how the spec might be altered to deal with this case (without introducing too much complexity)?

jgm avatar Sep 01 '25 18:09 jgm

Do you have a suggestion to make about how the spec might be altered to deal with this case (without introducing too much complexity)?

I’m sorry, I’m not familiar with the details of the spec and the project so I’m not sure how helpful I can be at the moment.

That said, at first glance, it looks like this was a known/planned limitation: https://github.com/commonmark/commonmark-spec/blob/master/spec.txt#L2425

My initial naïve gut feeling is that the rule for HTML blocks of type 6 should be modified so that they do not end at the first blank like if the parser is inside a container element that preserves formatting (this should also affect

aral avatar Sep 08 '25 10:09 aral