markdown-to-jsx Disable parsing of raw HTML

Is there a way to disable parsing of raw HTML altogether? I know I can override specific tags but I'd like to automatically escape HTML characters without transforming the data stored in my database.

Oct 26 '18 00:10 will-hart

Not currently, but it's something that could probably be added.

Oct 26 '18 02:10 quantizor

It would be nice to combine escaping HTML elements with a whitelist of HTML elements that are allowed to be parsed; anything else would be escaped. This would provide additional safety when displaying user-generated input.

Mar 05 '19 17:03 fastfedora

Another feature might be a sanitization function that can be run before the attributes are passed to the parsed element. While you can always provide custom components for everything, one function could handle both things like URL sanitization for non-javascript: URLs and content filtering (curse words, etc).

Something along the lines of sanitize(node, rule, element) that would return an updated node. So an option might be:

sanitize: (node, rule, element) => (element == 'a' ? { ...node, target: customSanitizeUrl(node.target) } : node)

Or:

sanitize: (node, rule, element) => ({ ...node, content: bleepCurseWords(node.content) })

Then in the code, where it

   footnoteReference: {
      match: inlineRegex(FOOTNOTE_REFERENCE_R),
      order: PARSE_PRIORITY_HIGH,
      parse(capture /*, parse*/) {
        return {
          content: capture[1],
          target: `#${capture[1]}`,
        };
      },
      react(node, output, state) {
        const sanitizedNode = sanitize(node, 'footnoteReference', 'a');

        return (
          <a key={state.key} href={sanitizeUrl(sanitizedNode.target)}>
            <sup key={state.key}>{sanitizedNode.content}</sup>
          </a>
        );
      },
    },

This isn't the right issue for this, but it's a broader issue of how to handle user-generated content. I like this library, but I'm switching to react-markdown because it has better support for displaying user-generated content. markdown-to-jsx looks like a great library for internal content. To make it safely support user-generated content, I think you need:

Disable automatic HTML parsing
Whitelist of HTML to allow parsing for (optional)
Whitelist of allowable URLs and/or sanitization function to prevent bad links (not just JS links)

Hopefully this comment has been helpful in that.

Mar 05 '19 17:03 fastfedora

Possibly relevant package https://github.com/cure53/DOMPurify

Aug 07 '19 14:08 rescribet

That lib is bigger than markdown-to-jsx itself unfortunately.

Adding some basic config to just disable the HTML parsing rules should be relatively straightforward and it would just end up in the generated markdown as plain text.

Aug 07 '19 15:08 quantizor

Should this issue should be closed now after #278?

Oct 08 '20 01:10 rahulgi

First, please excuse my lack of security knowledge 🙂 . I have a problem that optionally disabling parsing raw HTML right now will also disable my custom components.

const options = {overrides: {MyCustomComponent: MyCustomComponent}};

<MyCustomComponent/> // This no longer works if I disable parsing raw HTML.

But what if I want to disable parsing raw HTML only (ie, like

As stated here https://github.com/probablyup/markdown-to-jsx/pull/307#issue-421120162 I'd have to allow raw HTML but use something like dompurify, which is a bit heavy (larger than this library itself).

My question is, can we allow disabling raw HTML, but allow custom MDX components or somehow make it an option, maybe on disableParsingRawHTML itself?

Mar 27 '21 12:03 stephan-noel

@stephan-noel you can use a custom override just for script tags, for example:

const value = `Hello<div style="color: red;">World</div><script src="evil.com">Bad script</script>`

const MARKDOWN_OPTIONS = {
    overrides:
    {
        // If there is any text inside the script tag then render this, otherwise render nothing.
        script: (props: { children: string }) => props.children,
    },
}

<Markdown options={MARKDOWN_OPTIONS}>
    {value}
</Markdown>

Would just render as: But the script tag would be missing from the resulting html.

Feb 07 '22 11:02 AJamesPhillips

Hmm we should be discarding script tags entirely as they're obviously a malicious vector

Feb 07 '22 11:02 quantizor

Ah that's interesting. Yes I'm running 7.1.6 and without disableParsingRawHTML: true then the script's src attribute is rendered and is requested by the page.

Feb 07 '22 12:02 AJamesPhillips

markdown-to-jsx markdown-to-jsx copied to clipboard

Disable parsing of raw HTML

markdown-to-jsx
markdown-to-jsx copied to clipboard