Sanitize returns empty string when `PARSER_MEDIA_TYPE: application/xhtml+xml` and void tags
Bug
DOMPurify.sanitize returns an empty string when ran on HTML files containing void elements when application/xhtml+xml is set as parser media type.
Version: 2.6.0
Input
<html lang="en">
<head>
<title>Sample HTML5 Page</title>
</head>
<body>
<p>Hello</p>
<br>
</body>
</html>
Given output
Empty string
Expected output
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>Sample HTML5 Page</title>
</head>
<body>
<p>Hello</p>
<br />
</body>
</html>
Same outcome for other void elements, such as meta or link tags
Thanks for reporting it @lucamerighi !
Can you provide the full code sample with DOMPurify.sanitize() parameters?
Did you try 2.7.0 ?
Because isomorphic-dompurify is just a wrapper around dompurify, it also makes sense to report the issue to the dompurify issue queue.
Sure, here's the complete snippet
DOMPurify.sanitize(
`
<html lang="en">
<head>
<title>Sample HTML5 Page</title>
</head>
<body>
<p>Hello</p>
<br>
</body>
</html>`,
{
PARSER_MEDIA_TYPE: "application/xhtml+xml",
}
);
Tried with 2.7.0 and same empty output. Removing the <br> gives the expected output
@lucamerighi Thanks. For XHTML, can you use self-closing tags like this <br/>?
Yes, in XHTML you can use self closing tags. In fact, you have to convert all void elements to their self-closing variant for it to be valid
@lucamerighi Well, it's definitely a question/issue for dompurify developers. I'm afraid I'm not very familiar with internal stuff, so I can't help here.
And I should not suggest switching to the default PARSER_MEDIA_TYPE because you're probably using it for a reason.
Opened the issue there 🤞 Ye I need it for this reason exactly, parsing HTML into valid XML which between other means means fixing this kind of stuff
According to the dompurify maintainer, the behavior is by design https://github.com/cure53/DOMPurify/issues/938#issuecomment-2048995982 , so I'm closing this issue too.