draft-convert icon indicating copy to clipboard operation
draft-convert copied to clipboard

Fix double escaping of entities

Open amannn opened this issue 8 years ago • 2 comments

Fixes #47.

The double escaping occurs, because there's a call to encodeBlock at the beginning followed by a call to renderToStaticMarkup if a element is provided that has children.

So basically there are two options to fix this:

  1. Only escape text that is not passed to to a renderToStaticMarkup call.
  2. Unescape the resulting markup from renderToStaticMarkup.

To me the first option would have been a bit cleaner, but surfaced some issues for me:

  • I thought about calling encodeBlock in getElementHTML, but since getElementHTML is only used for entities and inline styles, this would lead to regular blocks not being escaped.
  • It's not possible to escape after creating entities and inline styles, as those functions create HTML tags which shouldn't be encoded.
  • The only way for React to render unescaped content is dangerouslySetInnerHTML, however as we don't have HTML but a React element this is not possible here.

So a proper way to go with the first option might have led to a bigger change to the code base which I wanted to avoid. Therefore I went for the second option, which seems to work pretty well. The unescape helper is pretty small (1.2K minified, non-gzipped) so I hope this is ok for you.

I encountered this bug recently, so I thought I'll provide a fix for it. Let me know what you think! 🙂

amannn avatar Aug 30 '17 09:08 amannn

Thank you for the thoughtful PR and description! I think in general we're going to need to do something like this. My main concern is how this affects the the rest of the HTML output such as attributes. For example, in one of your tests if you added an attribute like this (either intentionally or via malicious user input):

const Link = ({ children }) => <a href="&quot;&gt;&lt;/a&gt;&lt;script&gt;alert(&quot;hi&quot;)&lt;/script&gt;&lt;a style=&quot;">{children}</a>;

this would get unescaped out to be regular HTML characters:

<a href=""></a><script>alert("hi")</script><a style="">Led Zeppelin - Since I&#x27;ve Been Loving You</a>

I'm not quite sure how to get around this outside of re-parsing the HTML to DOM elements and doing the unescaping on each text node. I imagine there'd be a performance hit with that strategy. Does this make sense?

benbriggs avatar Aug 30 '17 20:08 benbriggs

Thanks for the quick reply!

Oh, you're right – that's indeed an issue. Hmm, I think your proposed solution would work, but as you mentioned this could decrease performance a bit and potentially also increase complexity for a rather edge case I guess.

Is there from your perspective a good way of only escaping text that doesn't get passed to a renderToStaticMarkup call (solution 1 from above)?

amannn avatar Aug 31 '17 07:08 amannn