`<hello` return `` when escaping content
PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.
To Reproduce
Step by step instructions to reproduce the behavior:
- Install the version
2.17.0 - Write a program wih this script
const res = sanitizeHtml('<hello', {
allowedAttributes: {
...sanitizeHtml.defaults.allowedAttributes,
span: ['data-userid'],
'*': ['class']
},
disallowedTagsMode: 'recursiveEscape',
preserveEscapedAttributes: true
});
console.log(`Result: "${res}"`)
// Result: ""
- See the issue
Expected behavior
A clear and concise description of what you expected to happen.
The console log should return <hello since it isn't a valid html tag.
Describe the bug
A clear and concise description of what the bug is.
Any texts starting with < are completly removed from the final output in escape mode
In complementary, text like <hello you> are returned as <hello you=""> instead of just being escaped
Details
Version of Node.js: 20.17.0 PLEASE NOTE: Only stable LTS versions (10.x and 12.x) are fully supported but we will do our best with newer versions.
Server Operating System: The server (which might be your dev laptop) on which Apostrophe is running. Linux? MacOS X? Windows? Is Docker involved?
Additional context:
Add any other context about the problem here. If the problem is specific to a browser, OS or mobile device, specify which.
Screenshots If applicable, add screenshots to help explain your problem.
I think your test case is incomplete. What HTML are we sanitizing?
ah sorry, let me update my message
but my tests are:
'<hello'=>''(expected:<hello)'<hello you'=>''(expected<hello you)'<hello you>'=>'<hello you="">'(expected<hello you>)
Taking a quick look at the code, I think the problem stems from the newer onclosetag handler and how html-parser2 handles malformed tags. I think it treats the <hello as an unmatched closing tag, so it just gets discarded. I think a solution might me to parse through and escape out any malformed tags before passing it to the parser. Not sure though, just a guess. I'll mark this as a good first issue and then circle back at some point.
I think the absence of any closing > it probably just treats it as too malformed to do much with, so we might or might not be able to access this information from htmlparser2.