Best practices/recommendations on safe HTML exports
My motivation is that I have a project that will be accepting and displaying arbitrary user-uploaded Jupyter notebooks in HTML.
I've generally struggled to find information on what best practices are on how to safely render and display notebooks in HTML. It would be great if there was more thorough explanation in the nbconvert documentation about this topic.
I've seen that there is a sanitize-html / should_sanitize_html option when using the HTML exporter. My understanding from looking at the code is that cells are run through the clean_html filter. Some questions:
- How should I understand this filter's level of safety in a broader context?
- This doesn't appear customizable (in an obvious way—I guess some of these allow lists could be monkeypatched?). Should this not be customized? From trying to use it, it seems like paragraph and header tags are not allowed, which seems to break fairly basic markdown formatting in notebooks.
- Are there other basic vulnerabilities to watch out for that using the sanitize option doesn't address?
One obvious model for rendering user-uploaded notebooks is GitHub. I understand that GitHub does some kind of cleaning or places restrictions on the rendering, but I haven't been able to find details or code about what that actually is. If that is described anywhere, it would also be a helpful thing to link to in the documentation.
(Associated topic on the Jupyter Discourse)