vocadb
vocadb copied to clipboard
Better HTML sanitization for Markdown content
By spec, all HTML is allowed by Markdown. Currently we're HTML encoding all text before it is passed to the Markdown parser. This prevents the most obvious XSS attacks, but not all. It'd be better to sanitize the generated HTML with a whitelist of allowed tags. Obviously we can't just HTML encode all of the HTML generated by the Markdown parser (or strip HTML tags), because otherwise using Markdown would be pointless to begin with.
The HtmlSanitizer library could possibly be used for this. There's also the Web Protection Library by Microsoft, but I've heard it's not very good. CsQuery HTML parsing library could also be used.