summernote-cleaner icon indicating copy to clipboard operation
summernote-cleaner copied to clipboard

badTags tagStripper can suffer catastrophic backtracking

Open alexmdavis opened this issue 1 year ago • 10 comments

The tagStripper RegExp for badTags can backtrack catastrophically if the content is very large, and if no closing tag exists (such as a meta tag). https://github.com/DiemenDesign/summernote-cleaner/blob/ee248b92f047bdcfb30007b31a047d78253ae814/summernote-cleaner.js#L217-L218

alexmdavis avatar Jun 07 '23 17:06 alexmdavis

My quick solution is to check for any closing tag first:

var tagClose = new RegExp('</' + badTag + '[^>\v]*>')
if (output.match(tagClose)) {
  tagStripper = new RegExp('<' + badTag + '(.|\r|\n)*</' + badTag + '[^>\v]*>', 'gmi');
  output = output.replace(tagStripper, '');
}

However I imagine that there is some modification to the RegExp that can fix the problem.

alexmdavis avatar Jun 07 '23 17:06 alexmdavis

Can you give an example that would "backtrack catastrophically" and the configuration you are currently using?

RichardSquires avatar Jun 11 '23 16:06 RichardSquires

Do you think this issue might have something to do with this: https://github.com/summernote/summernote/pull/4472

DennisSuitters avatar Jun 12 '23 11:06 DennisSuitters

I don't think it's related to that issue.

Here is an example of catastrophic backtracking, using the tag meta: https://regex101.com/r/5Xyvyu/2

The content consists of just one meta tag and one img tag. The regex reaches the end of the string without finding a closing tag for the meta, and then crashes when trying to backtrack through the huge img content.

alexmdavis avatar Jun 12 '23 14:06 alexmdavis

@alexmdavis sorry didn't see the notification for the reply.

That is the regex, but I'm curious as to the configuration you have as I have seen this code parse far bigger texts than this and with other single tags such as <hr> (in the html just like that). If the issue occurs for meta I would expect the same for hr, and I'm not confident what the issue is given your description.

It would help myself debug this if I could get the plugin configuration (I can use what you put in the regex as a test for this issue).

Many thanks for you help so far.

RichardSquires avatar Jun 23 '23 19:06 RichardSquires

@RichardSquires this is one config that reproduces the problem (with meta in particular).

      action: 'paste'
      keepHtml: true
      badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head']
      keepTagContents: ['html', 'body']
      badAttributes: ['class', 'data-(.*?)', 'id']
      limitChars: 30000
      limitDisplay: 'none'
      limitStop: false

alexmdavis avatar Jun 27 '23 17:06 alexmdavis

Should you be using meta within content markup? Keep in mind Summernote is primarily concerned with editing content that is visual, i.e. content that doesn't use elements normally used for page layout, head areas, or script elements. Meta tags shouldn't really be used within editable content areas.

Do you really mean the issue occurs where elements don't use a closing element or ending /?

DennisSuitters avatar Jun 28 '23 02:06 DennisSuitters

Yes, perhaps I have been too specific to one example. It appears to happen with any tag that doesn't have a closing element or /. In my regex example, the issue persists with hr (or anything else) instead of meta.

alexmdavis avatar Jun 28 '23 14:06 alexmdavis

Well, that makes sense then, and more importantly img considering it would be more often used than hr.

DennisSuitters avatar Jun 29 '23 01:06 DennisSuitters

@RichardSquires this is one config that reproduces the problem (with meta in particular).

      action: 'paste'
      keepHtml: true
      badTags: ['applet', 'col', 'colgroup', 'embed', 'noframes', 'noscript', 'script', 'title', 'meta', 'link', 'head']
      keepTagContents: ['html', 'body']
      badAttributes: ['class', 'data-(.*?)', 'id']
      limitChars: 30000
      limitDisplay: 'none'
      limitStop: false

@alexmdavis Can you shed some light on why keepTagsContents is set to ['html', 'body'] instead of []?

RichardSquires avatar Jul 01 '23 17:07 RichardSquires