Android-FileBrowser-FilePicker icon indicating copy to clipboard operation
Android-FileBrowser-FilePicker copied to clipboard

Document lack of sanitization of HTML output

Open MaddyGuthridge opened this issue 1 year ago • 3 comments

Like many other Markdown processors, Python-Markdown does not sanitize its output, meaning that malicious code can be embedded within markdown documents.

# Some markdown document

<script>alert("Evil laughter")</script>

If this isn't made clear to users, there is a risk that they will unintentionally create opportunities for XSS attacks. It would be worthwhile documenting the lack of sanitization, and perhaps recommend an HTML sanitization library, such as bleach.

MaddyGuthridge avatar Sep 08 '24 12:09 MaddyGuthridge

Hmm, this used to me mentioned in our documentation. Not sure when or why it was removed. But, yes, I agree, we should be documenting this. Although, an argument has been made by some in the past that as all markdown parsers do not sanitize, there is no need to document this as there should be no expectation from users anyway. Personally, I recognize that not all users know or understand that and so we should be expressly stating as much.

By the way, we used to recommend bleach as a solution. We stopped making that recommendation as the bleach project has been deprecated. That still appears to be the case.

waylan avatar Sep 10 '24 16:09 waylan

nh3 seems like a good and actively maintained alternative to Bleach: https://pypi.org/project/nh3/ (messense/nh3 on GitHub)

Sources for the recommendation:

  • https://adamj.eu/tech/2023/12/13/django-sanitize-incoming-html-nh3/
  • https://realpython.com/podcasts/rpp/187/#t=3086

dbader avatar Apr 25 '25 20:04 dbader

nh3 is great, but as it consists of bindings to a Rust package, that means it is not a pure Python package and not universally installable---and that gives me pause. I should also mention that we actually used to recommend yourcelf/bleach-allowlist, which is a set of config options for Bleach which makes sense for HTML generated from Markdown. A similar package for nh3 might be more useful to users that the raw package. 🤷‍♂

waylan avatar Apr 26 '25 22:04 waylan