markdown-to-jsx
markdown-to-jsx copied to clipboard
Disable parsing of raw HTML
Is there a way to disable parsing of raw HTML altogether? I know I can override specific tags but I'd like to automatically escape HTML characters without transforming the data stored in my database.
Not currently, but it's something that could probably be added.
It would be nice to combine escaping HTML elements with a whitelist of HTML elements that are allowed to be parsed; anything else would be escaped. This would provide additional safety when displaying user-generated input.
Another feature might be a sanitization function that can be run before the attributes are passed to the parsed element. While you can always provide custom components for everything, one function could handle both things like URL sanitization for non-javascript: URLs and content filtering (curse words, etc).
Something along the lines of sanitize(node, rule, element) that would return an updated node. So an option might be:
sanitize: (node, rule, element) => (element == 'a' ? { ...node, target: customSanitizeUrl(node.target) } : node)
Or:
sanitize: (node, rule, element) => ({ ...node, content: bleepCurseWords(node.content) })
Then in the code, where it
footnoteReference: {
match: inlineRegex(FOOTNOTE_REFERENCE_R),
order: PARSE_PRIORITY_HIGH,
parse(capture /*, parse*/) {
return {
content: capture[1],
target: `#${capture[1]}`,
};
},
react(node, output, state) {
const sanitizedNode = sanitize(node, 'footnoteReference', 'a');
return (
<a key={state.key} href={sanitizeUrl(sanitizedNode.target)}>
<sup key={state.key}>{sanitizedNode.content}</sup>
</a>
);
},
},
This isn't the right issue for this, but it's a broader issue of how to handle user-generated content. I like this library, but I'm switching to react-markdown because it has better support for displaying user-generated content. markdown-to-jsx looks like a great library for internal content. To make it safely support user-generated content, I think you need:
- Disable automatic HTML parsing
- Whitelist of HTML to allow parsing for (optional)
- Whitelist of allowable URLs and/or sanitization function to prevent bad links (not just JS links)
Hopefully this comment has been helpful in that.
Possibly relevant package https://github.com/cure53/DOMPurify
That lib is bigger than markdown-to-jsx itself unfortunately.
Adding some basic config to just disable the HTML parsing rules should be relatively straightforward and it would just end up in the generated markdown as plain text.
Should this issue should be closed now after #278?
First, please excuse my lack of security knowledge 🙂 . I have a problem that optionally disabling parsing raw HTML right now will also disable my custom components.
const options = {overrides: {MyCustomComponent: MyCustomComponent}};
<MyCustomComponent/> // This no longer works if I disable parsing raw HTML.
But what if I want to disable parsing raw HTML only (ie, like
As stated here https://github.com/probablyup/markdown-to-jsx/pull/307#issue-421120162 I'd have to allow raw HTML but use something like dompurify, which is a bit heavy (larger than this library itself).
My question is, can we allow disabling raw HTML, but allow custom MDX components or somehow make it an option, maybe on disableParsingRawHTML itself?
@stephan-noel you can use a custom override just for script tags, for example:
const value = `Hello<div style="color: red;">World</div><script src="evil.com">Bad script</script>`
const MARKDOWN_OPTIONS = {
overrides:
{
// If there is any text inside the script tag then render this, otherwise render nothing.
script: (props: { children: string }) => props.children,
},
}
<Markdown options={MARKDOWN_OPTIONS}>
{value}
</Markdown>
Would just render as:
But the script tag would be missing from the resulting html.
Hmm we should be discarding script tags entirely as they're obviously a malicious vector
Ah that's interesting. Yes I'm running 7.1.6 and without disableParsingRawHTML: true then the script's src attribute is rendered and is requested by the page.