summernote-cleaner
summernote-cleaner copied to clipboard
badTags tagStripper can suffer catastrophic backtracking
The tagStripper RegExp for badTags can backtrack catastrophically if the content is very large, and if no closing tag exists (such as a meta
tag). https://github.com/DiemenDesign/summernote-cleaner/blob/ee248b92f047bdcfb30007b31a047d78253ae814/summernote-cleaner.js#L217-L218
My quick solution is to check for any closing tag first:
var tagClose = new RegExp('</' + badTag + '[^>\v]*>')
if (output.match(tagClose)) {
tagStripper = new RegExp('<' + badTag + '(.|\r|\n)*</' + badTag + '[^>\v]*>', 'gmi');
output = output.replace(tagStripper, '');
}
However I imagine that there is some modification to the RegExp that can fix the problem.
Can you give an example that would "backtrack catastrophically" and the configuration you are currently using?
Do you think this issue might have something to do with this: https://github.com/summernote/summernote/pull/4472
I don't think it's related to that issue.
Here is an example of catastrophic backtracking, using the tag meta
: https://regex101.com/r/5Xyvyu/2
The content consists of just one meta
tag and one img
tag. The regex reaches the end of the string without finding a closing tag for the meta
, and then crashes when trying to backtrack through the huge img
content.
@alexmdavis sorry didn't see the notification for the reply.
That is the regex, but I'm curious as to the configuration you have as I have seen this code parse far bigger texts than this and with other single tags such as <hr>
(in the html just like that). If the issue occurs for meta I would expect the same for hr, and I'm not confident what the issue is given your description.
It would help myself debug this if I could get the plugin configuration (I can use what you put in the regex as a test for this issue).
Many thanks for you help so far.
@RichardSquires this is one config that reproduces the problem (with meta
in particular).
action: 'paste'
keepHtml: true
badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head']
keepTagContents: ['html', 'body']
badAttributes: ['class', 'data-(.*?)', 'id']
limitChars: 30000
limitDisplay: 'none'
limitStop: false
Should you be using meta within content markup? Keep in mind Summernote is primarily concerned with editing content that is visual, i.e. content that doesn't use elements normally used for page layout, head areas, or script elements. Meta tags shouldn't really be used within editable content areas.
Do you really mean the issue occurs where elements don't use a closing element or ending /
?
Yes, perhaps I have been too specific to one example. It appears to happen with any tag that doesn't have a closing element or /
. In my regex example, the issue persists with hr
(or anything else) instead of meta
.
Well, that makes sense then, and more importantly img
considering it would be more often used than hr
.
@RichardSquires this is one config that reproduces the problem (with
meta
in particular).action: 'paste' keepHtml: true badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head'] keepTagContents: ['html', 'body'] badAttributes: ['class', 'data-(.*?)', 'id'] limitChars: 30000 limitDisplay: 'none' limitStop: false
@alexmdavis Can you shed some light on why keepTagsContents is set to ['html', 'body'] instead of []?