markdown-to-jsx icon indicating copy to clipboard operation
markdown-to-jsx copied to clipboard

Disable parsing of raw HTML

Open will-hart opened this issue 7 years ago • 10 comments

Is there a way to disable parsing of raw HTML altogether? I know I can override specific tags but I'd like to automatically escape HTML characters without transforming the data stored in my database.

will-hart avatar Oct 26 '18 00:10 will-hart

Not currently, but it's something that could probably be added.

quantizor avatar Oct 26 '18 02:10 quantizor

It would be nice to combine escaping HTML elements with a whitelist of HTML elements that are allowed to be parsed; anything else would be escaped. This would provide additional safety when displaying user-generated input.

fastfedora avatar Mar 05 '19 17:03 fastfedora

Another feature might be a sanitization function that can be run before the attributes are passed to the parsed element. While you can always provide custom components for everything, one function could handle both things like URL sanitization for non-javascript: URLs and content filtering (curse words, etc).

Something along the lines of sanitize(node, rule, element) that would return an updated node. So an option might be:

sanitize: (node, rule, element) => (element == 'a' ? { ...node, target: customSanitizeUrl(node.target) } : node)

Or:

sanitize: (node, rule, element) => ({ ...node, content: bleepCurseWords(node.content) })

Then in the code, where it

   footnoteReference: {
      match: inlineRegex(FOOTNOTE_REFERENCE_R),
      order: PARSE_PRIORITY_HIGH,
      parse(capture /*, parse*/) {
        return {
          content: capture[1],
          target: `#${capture[1]}`,
        };
      },
      react(node, output, state) {
        const sanitizedNode = sanitize(node, 'footnoteReference', 'a');

        return (
          <a key={state.key} href={sanitizeUrl(sanitizedNode.target)}>
            <sup key={state.key}>{sanitizedNode.content}</sup>
          </a>
        );
      },
    },

This isn't the right issue for this, but it's a broader issue of how to handle user-generated content. I like this library, but I'm switching to react-markdown because it has better support for displaying user-generated content. markdown-to-jsx looks like a great library for internal content. To make it safely support user-generated content, I think you need:

  • Disable automatic HTML parsing
  • Whitelist of HTML to allow parsing for (optional)
  • Whitelist of allowable URLs and/or sanitization function to prevent bad links (not just JS links)

Hopefully this comment has been helpful in that.

fastfedora avatar Mar 05 '19 17:03 fastfedora

Possibly relevant package https://github.com/cure53/DOMPurify

rescribet avatar Aug 07 '19 14:08 rescribet

That lib is bigger than markdown-to-jsx itself unfortunately.

Adding some basic config to just disable the HTML parsing rules should be relatively straightforward and it would just end up in the generated markdown as plain text.

quantizor avatar Aug 07 '19 15:08 quantizor

Should this issue should be closed now after #278?

rahulgi avatar Oct 08 '20 01:10 rahulgi

First, please excuse my lack of security knowledge 🙂 . I have a problem that optionally disabling parsing raw HTML right now will also disable my custom components.

const options = {overrides: {MyCustomComponent: MyCustomComponent}};

<MyCustomComponent/> // This no longer works if I disable parsing raw HTML.

But what if I want to disable parsing raw HTML only (ie, like

As stated here https://github.com/probablyup/markdown-to-jsx/pull/307#issue-421120162 I'd have to allow raw HTML but use something like dompurify, which is a bit heavy (larger than this library itself).

My question is, can we allow disabling raw HTML, but allow custom MDX components or somehow make it an option, maybe on disableParsingRawHTML itself?

stephan-noel avatar Mar 27 '21 12:03 stephan-noel

@stephan-noel you can use a custom override just for script tags, for example:

const value = `Hello<div style="color: red;">World</div><script src="evil.com">Bad script</script>`

const MARKDOWN_OPTIONS = {
    overrides:
    {
        // If there is any text inside the script tag then render this, otherwise render nothing.
        script: (props: { children: string }) => props.children,
    },
}

<Markdown options={MARKDOWN_OPTIONS}>
    {value}
</Markdown>

Would just render as: image But the script tag would be missing from the resulting html.

AJamesPhillips avatar Feb 07 '22 11:02 AJamesPhillips

Hmm we should be discarding script tags entirely as they're obviously a malicious vector

quantizor avatar Feb 07 '22 11:02 quantizor

Ah that's interesting. Yes I'm running 7.1.6 and without disableParsingRawHTML: true then the script's src attribute is rendered and is requested by the page.

AJamesPhillips avatar Feb 07 '22 12:02 AJamesPhillips