markdownlint icon indicating copy to clipboard operation
markdownlint copied to clipboard

MD033 flagging HTML tags in image alt text strings

Open nschonni opened this issue 3 years ago • 7 comments

https://dlaa.me/markdownlint/#%25m!%5BThe%20default%2C%20focused%2C%20and%20disabled%20%3Ctextarea%3E%20element%20in%20Firefox%2071%20and%20Safari%2013%20on%20Mac%20OSX%20and%20Edge%2018%2C%20Yandex%2014%2C%20Firefox%20and%20Chrome%20on%20Windows%2010.%5D(textarea_basic.png) Since the alt tag doesn't get parsed as HTML, there shouldn't be a need to escape these. Ran into this because i had been escaping the tags, but then running prettier would clean off the escaping because it was not needed.

Similar thing happens with link title strings, but that's probably a separate bug

nschonni avatar Sep 11 '22 03:09 nschonni

As your example shows, HTML content in the image alternate text region can be removed by the parser and so I think it is reasonable for markdownlint to warn about it.

Here is an example using markdown-it directly: http://markdown-it.github.io/#md3=%7B%22source%22%3A%22%23%20Issue%20579%5Cn%5Cn!%5Btext%20%3Ctextarea%3E%20text%5D%28image.png%29%5Cn%22%2C%22defaults%22%3A%7B%22html%22%3Atrue%2C%22xhtmlOut%22%3Afalse%2C%22breaks%22%3Afalse%2C%22langPrefix%22%3A%22language-%22%2C%22linkify%22%3Atrue%2C%22typographer%22%3Atrue%2C%22_highlight%22%3Atrue%2C%22_strict%22%3Afalse%2C%22_view%22%3A%22src%22%7D%7D

DavidAnson avatar Sep 11 '22 04:09 DavidAnson

Hmm, I'm thinking it might be a Markdown-it bug then. If you run it through GitHub's parser or the remark parser like Prettier uses, its not treated as a literal

The default, focused, and disabled <textarea> element in Firefox 71 and Safari 13 on Mac OSX and Edge 18, Yandex 14, Firefox and Chrome on Windows 10.

nschonni avatar Sep 11 '22 05:09 nschonni

Toggling the "HTML" checkbox on that demo page opt into and out of this removal behavior.

Skimming the CommonMark specification, it's not clear to me that this scenario is directly addressed, so I think the parser is behaving consistently.

DavidAnson avatar Sep 11 '22 05:09 DavidAnson

I filed something on Markdown-it, but looking at the spec https://spec.commonmark.org/0.30/#images it is light, but

Though this spec is concerned with parsing, not rendering, it is recommended that in rendering to HTML, only the plain string content of the image description be used. Note that in the above example, the alt attribute’s value is foo bar, not foo bar or foo bar. Only the plain string content is rendered, without formatting.

nschonni avatar Sep 11 '22 05:09 nschonni

Everything is parsed in alt, but only plain text is rendered. Consider ![foo *bar* baz]() - it's gonna lose asterisks (in cmark and in github version too).

I believe linter for commonmark syntax should flag any non-text, non-escape inside img tag, because it'll just get ignored by parsers. HTML is no exception there.

rlidwka avatar Sep 12 '22 16:09 rlidwka

The CommonMark sample does remove asterisks, but doesn't remove tags https://spec.commonmark.org/dingus/?text=%23%20Issue%20579%0A%0A!%5Btext%20asterisks%20text%5D(image.png)%0A%0A!%5Btext%20%3Ctextarea%3E%20text%5D(image.png)%0A%0A

nschonni avatar Sep 12 '22 17:09 nschonni

Found a relevant discussion https://github.com/commonmark/commonmark-spec/issues/716 but there is no resolution right now

nschonni avatar Sep 12 '22 17:09 nschonni

Closing this based on my Sept 10 example and lack of agreement in the comments about whether this is reasonable.

DavidAnson avatar Oct 18 '22 04:10 DavidAnson

OK, I'll ping this issue if there is a resolution on the CommonMark or Markdown-it issues

nschonni avatar Oct 18 '22 04:10 nschonni