sanitize-html icon indicating copy to clipboard operation
sanitize-html copied to clipboard

`<hello` return `` when escaping content

Open Bricklou opened this issue 6 months ago • 4 comments

PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Install the version 2.17.0
  2. Write a program wih this script
const res = sanitizeHtml('<hello', {
  allowedAttributes: {
      ...sanitizeHtml.defaults.allowedAttributes,
      span: ['data-userid'],
      '*': ['class']
    },
    disallowedTagsMode: 'recursiveEscape',
    preserveEscapedAttributes: true
});
console.log(`Result: "${res}"`)
// Result: ""
  1. See the issue

Expected behavior

A clear and concise description of what you expected to happen.

The console log should return &lt;hello since it isn't a valid html tag.

Describe the bug

A clear and concise description of what the bug is.

Any texts starting with < are completly removed from the final output in escape mode In complementary, text like <hello you> are returned as <hello you=""> instead of just being escaped

Details

Version of Node.js: 20.17.0 PLEASE NOTE: Only stable LTS versions (10.x and 12.x) are fully supported but we will do our best with newer versions.

Server Operating System: The server (which might be your dev laptop) on which Apostrophe is running. Linux? MacOS X? Windows? Is Docker involved?

Additional context:

Add any other context about the problem here. If the problem is specific to a browser, OS or mobile device, specify which.

Screenshots If applicable, add screenshots to help explain your problem.

Bricklou avatar Jun 02 '25 08:06 Bricklou

I think your test case is incomplete. What HTML are we sanitizing?

BoDonkey avatar Jun 02 '25 09:06 BoDonkey

ah sorry, let me update my message

but my tests are:

  • '<hello' => '' (expected: &lt;hello)
  • '<hello you' => '' (expected &lt;hello you)
  • '<hello you>' => '<hello you="">' (expected &lt;hello you&gt;)

Bricklou avatar Jun 02 '25 09:06 Bricklou

Taking a quick look at the code, I think the problem stems from the newer onclosetag handler and how html-parser2 handles malformed tags. I think it treats the <hello as an unmatched closing tag, so it just gets discarded. I think a solution might me to parse through and escape out any malformed tags before passing it to the parser. Not sure though, just a guess. I'll mark this as a good first issue and then circle back at some point.

BoDonkey avatar Jun 02 '25 12:06 BoDonkey

I think the absence of any closing > it probably just treats it as too malformed to do much with, so we might or might not be able to access this information from htmlparser2.

boutell avatar Jun 02 '25 13:06 boutell