sanitize-html
sanitize-html copied to clipboard
Failing to parse large base64 encoded image url in an img element's src
To Reproduce
Step by step instructions to reproduce the behavior:
- Sanitize a string with a very large base64 encoded image url in the img element's src
- Allow attributes for img and src in options
- Returns string that contains an img element with missing url value i.e. it has no image anymore
Expected behavior
The encoded image url value should be contained in the returned string
Describe the bug
The sanitizeHtml function seems to discard long base64 encoded data when sanitizing and therefore returns a string with missing attribute values for the img, i.e. it returns an empty img element
Details
Version of Node.js: 18
Server Operating System: Windows
Additional context: There is no error object returned when fails to parse the image url
How large is large?
This could be an upstream limitation of htmlparser2, but I'm not casting blame, as I'm not 100% sure why there would be any limit there either. There is definitely no "if bytes more than X, reject it" policy in sanitize-html.
How large is large?
The encoded data is 172 KB large when copied over to Notepad and contains 177,112 characters. So i don't know if that is large for a raw base64 image but it is a long line of characters inside a html element. @boutell
It doesn't seem unreasonable to me. Can you create a PR adding a failing unit test?
Just as a notice, I will have to come back to this on a later date because of time constraints.
I'm seeing the same issue -- even after making sure img is an allowed tag and src is an allowed attribute for img, it still removes the src entirely when I sanitize it.
Please provide a failing unit test in test/test.js
so we can be sure we are talking about the same thing.
You can try out your tests with:
npm install
npm test