bluemonday
bluemonday copied to clipboard
Paragraph sanitization (e.g. img.alt) is too restrictive, disallows punctuation
This regexp is used to validate alt text of images. It disallows common punctuation, which causes issues when alt text is copied from news articles or source code listings for example. The result is alt attribute being dropped, rendering the image inaccessible to vision impaired people. And the text author is unlikely to even notice the issue, as visually the result seems just fine.
Subset of common symbols (some used in non-English languages) currently forbidden by this regular expression: "„“”‘’«»#$§%‰&*+±–—:;=?‽¡¿@{}|~…°®™
.
I’m not sure I understand the purpose of restricting to a specific character set here, as opposed to properly escaping special characters (which I believe bluemonday does automatically). Is the concern that the contents of the alt
or title
attribute might be taken as the HTML source of some pop-up? Wouldn’t it make more sense to blacklist only angle brackets then?