Two proposed changes to blaze-html (xhtml support, don't escape ')
I wrote Jasper an email about these issues earlier. You can consider these concrete proposals. Perhaps both are controversial. The two changes are completely independent:
(1) Don't escape ' (justification in commit).
(2) Added modules for XHtml1.Strict, Transitional, and FrameSet. These use the same tags and attributes as the corresponding Html modules. The only differences are that leaf nodes are self-closed, and a different doctype is used.
These changes would allow me to use blaze-html for pandoc's output.
Hi John,
making blaze-html usable for pandoc is important to me. However, I'm not sure about dropping the escaping of singlequotes. Is it disallowed in XHTML? I'd rather be on the safe side, than weaken the security guarantees given by HTML code generated blaze-html. We used the following information about cross-site scripting as a template for choosing the escaping.
best regards, Simon
PS: If we change the escaping, then I'll also have to adapt blaze-builder accordingly. That's no problem. It's more a note that I have to do it :-)
Simon,
I didn't have to make any changes to blaze-builder to change the escaping: I just made a couple changes in the renderers: https://github.com/jgm/blaze-html/commit/e42848d675bd7cff3d3c4e5c69ed8f013572ffde
It's true that putting unescaped user input inside a single-quoted attribute could get you in trouble, if the user supplied string contained a ' character:
for example, this would be bad in ruby:
"bad"
because if the user passed in "dog's", you'd get:
<a href='dog's'>bad
and who knows how a browser might interpret that.
However, in blaze-html, you always use double quotes around attributes. So I fail to see how this could be a problem for you. The ONLY time ' needs to be escaped is when it's inside a single-quoted attribute. The following is just fine:
bad
That said, the single quote escaping is not as big an issue for me as the XHTML combinators, which I really need for pandoc. It's more of an aesthetic issue. If you choose not to change blaze-html, I'll probably work around it by using my own string-escape function in pandoc. I'd rather not do that, but I could live with it if needed.
Best, John
+++ Simon Meier [Dec 18 11 12:22 ]:
Hi John,
making blaze-html usable for pandoc is important to me. However, I'm not sure about dropping the escaping of singlequotes. Is it disallowed in XHTML? I'd rather be on the safe side, than weaken the security guarantees given by HTML code generated blaze-html. We used the following information about cross-site scripting as a template for choosing the escaping.
best regards, Simon
PS: If we change the escaping, then I'll also have to adapt blaze-builder accordingly. That's no problem. It's more a note that I have to do it :-)
Reply to this email directly or view it on GitHub: https://github.com/jaspervdj/blaze-html/pull/48#issuecomment-3196667
I think John's point is valid, but I'd rather be 100% sure. Perhaps we should throw a mail to the Haskell web-devel list to see if they can think of any security concerns?
+++ Jasper Van der Jeugt [Dec 18 11 13:41 ]:
I think John's point is valid, but I'd rather be 100% sure. Perhaps we should throw a mail to the Haskell web-devel list to see if they can think of any security concerns?
That's a good idea.
@jgm, I've cherry-picked the XHTML patch. This has already been put on Hackage as 0.4.3.0.
I'm planning on integrating the other patch as well, as there seem to be no security concerns. However, there is a small problem: some of the escaping code (more specifically, for the UTF-8 renderer) is indeed located in the blaze-builder library (the Blaze.ByteString.Builder.Html.Utf8 module). First off, I would like the output of the renderers to be consistent as much as possible, so this code would need to change as well.
Now, it would seem like a good idea to move this module from blaze-builder to blaze-html, if possible, since this is more relevant to the latter. However, in that case, we also need to check whether or not there are any other packages depending upon this module. @meiersi, what do you think? Cheers!
@jaspervdj why don't you just use your own modified copy of that function in the new version of blaze-html? This way users of blaze-builder get the existing behavior and you have the freedom to change the escaping. This would result in the least surprises, I think.
In the near future, the second patch to bytestring will also be completed and accepted. This patch will provide a safe means to define such escaping functions directly. The blaze-builder package will become obsolete and replaced with a stub reexporting the bytestring builder functions.
+++ Jasper Van der Jeugt [Dec 20 11 01:08 ]:
@jgm, I've cherry-picked the XHTML patch. This has already been put on Hackage as 0.4.3.0.
Thank you! Let me know when the escaping patch is integrated, and I can simplify my code a bit.
I see that ' is still being escaped. The issue has come up again (http://github.com/jgm/pandoc/issues/629), so I thought I'd check to see whether anything was resolved (this issue is still open, so I guess not). Here is a link to Jasper's email to the haskell web-dev list and the one response he replied: http://www.haskell.org/pipermail/web-devel/2011/002282.html. It seems to me that there are good reasons to stop escaping ' outside of attributes, and no good reasons to continue...
I admit I had forgotten this issue. I'll fix it next week (leaving on a small trip tomorrow).
Great! No urgency on this end; I'm working around it for now.
+++ Jasper Van der Jeugt [Sep 28 12 07:51 ]:
I admit I had forgotten this issue. I'll fix it next week (leaving on a small trip tomorrow).
-- Reply to this email directly or [1]view it on GitHub. [J6T91GIPIyhU-8ti4GCGP7AlC2fiocPKodp06RQqyLyJVKPwspQRCLkbEJ5_dJrC.gif]
References
- https://github.com/jaspervdj/blaze-html/pull/48#issuecomment-8978777
BTW: I'm working on porting blaze-markup over to the new bytestring builder. I suggest we address this issue after this port, as it allows to also handle the fast UTF-8 path properly; i.e., using primitive encodings to implement the attribute and the standard HTML escaping.