html-escape
html-escape copied to clipboard
Script escaping is incorrect
Thanks for maintaining this library @magiclen!
I found an issue with the script escaping code. It appears to escape </script> only, and leave <!-- and <script> unchanged.
This is incorrect: if the HTML parser encounters the string <!-- <script>, then it enters a double-escaped state, where it will take two </script> tags to close the element.
https://html.spec.whatwg.org/multipage/scripting.html#restrictions-for-contents-of-script-elements
BTW, I recommend submitting a RustSec advisory for this, so that users are aware of the issue :smile:
Hi, I would like to know in which case that you need to use encode_script* functions with HTML comments?
One (discouraged but common) use case is to embed JSON in a page using a <script> element.
If that JSON includes user data, then a malicious user could include <!-- <script> in their input, and then corrupt the resulting page. (That might lead to XSS as well, though I can't think of a proof of concept right now.)
After some experiments, I found <!-- and --> need to appear in pairs even in a single <script> element. Or the text after <!-- will seem to be considered as a comment.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
Hello
<script>
alert("<!--");
alert("123");
alert("<script>");
alert("<\/script>");
alert("456");
alert("-->");
</script>
world!
</body>
</html>
The HTML above is correct and shows the alert dialog six times. However, if we remove -->, no alert dialog will be shown.
The following HTML code is also incorrect
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
Hello
<script>
alert("<!--");
alert("123");
alert("<script>");
alert("<\/script>");
alert("456");
</script>
world!
<script>
alert("-->");
</script>
</body>
</html>
<!-- can be replaced with <\!--.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
Hello
<script>
alert("<\!--");
alert("123");
alert("<script>");
alert("<\/script>");
alert("456");
</script>
world!
<script>
alert("-->");
</script>
</body>
</html>
The HTML above is correct and shows the alert dialog six times.
Internet Explorer has the same behavior.
One (discouraged but common) use case is to embed JSON in a page using a .
To clarify, I think there are two use cases here:
- Embedding untrusted JSON.
- Inlining a trusted script (i.e. replacing
<script src="foo.js">with<script>[contents of foo.js]</script>.
For (1), I think something similar to your proposal would work. (Though I would ask a web security expert for review first.)
For (2), I'm not so sure. Because </script> might appear outside of a string. For example, this code is strange but still valid:
// Check if the number 1 is less than the regex "script>"
1 </script>/;
Escaping would be incorrect here.
So for arbitrary JavaScript (not just JSON), I think it would be safer to raise an error instead.