Newlines between attributes break HTML block detection
The following HTML block is not detected as HTML in a .md file:
<img src="Documentation/Images/BasicHypergraphPlot.png"
width="478"
alt="Out[] = ... a plot showing 3 triangles connected at vertices labeled 2 and 4 ...">
It is detected as normal text data.
When newlines are removed, this block is correctly detected as HTML:
<img src="Documentation/Images/BasicHypergraphPlot.png" width="478" alt="Out[] = ... a plot showing 3 triangles connected at vertices labeled 2 and 4 ...">
Newlines between attributes should be ok.
I think it could be a bug, unless the ellipsized alt attribute value you quoted, actually contains one or more newlines. Does it?
In that case it violates the CommonMark spec, and MD4C would parse the entire tag as text instead of raw HTML.
Incidentally, the SO answer you linked does not apply to CommonMark. It applies to HTML in HTML browsers. For CommonMark refer to https://spec.commonmark.org/0.31.2/#raw-html.
Thanks for correcting the spec to reference. The alt value itself doesn't contain newlines. Without the alt tag the situation is unchanged:
"<img src=\"Documentation/Images/BasicHypergraphPlot.png\"
width=\"478\">"
is not detected as HTML, while
"<img src=\"Documentation/Images/BasicHypergraphPlot.png\" width=\"478\">"
is detected.
Is that the actual raw HTML you're passing to MD4C? Both samples are invalid HTML. This is what the W3C HTML validator reports:
Error: " in an unquoted attribute value. Probable causes: Attributes running together or a URL query string in an unquoted attribute value.
At line 7, column 11
ody>↩<img src=\"Documentation/
If your markdown contains backslash-quote, MD4C will parse it as text because that's what it is, both in CommonMark and in HTML. Try replacing backslash-quote with single-quote. Here some tests you can try (Linux shell, hopefully it's clear enough):
md2html <<< '<tag attr=\"value\">'
Parsed as text because text is.
<p><tag attr="value"></p>
md2html <<< '<tag attr="value">'
Parsed as raw HTML.
<tag attr="value">
md2html <<< '<tag attr="value"
attr="value">'
Also parsed as raw HTML.
<p><tag attr="value"
attr="value"></p>
Disabling the HTML feature parses everything as text.
md2html --fno-html <<< '<tag attr=\"value\">'
<p><tag attr="value"></p>
md2html --fno-html <<< '<tag attr="value">'
<p><tag attr="value"></p>
md2html --fno-html <<< '<tag attr="value"
attr="value">'
<p><tag attr="value"
attr="value"></p>