nh3
nh3 copied to clipboard
nh3 clean doesn't include html, head or body tags even when included in ALLOWED_TAGS
trafficstars
While using nh3 library, we came across a use case, where HTML content is expected for a field, but we need to remove the content that can cause XSS attack. Using nh3.clean() directly on the input text doesn't give the expected result and a lot of useful data is getting trimmed ultimately modifying the html template input.
import nh3
text = '''
<!DOCTYPE html>
<html>
<head>
<title>HTML Tutorial</title>
</head>
<body>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
'''
nh3.ALLOWED_TAGS.add('title')
nh3.ALLOWED_TAGS.add('head')
nh3.ALLOWED_TAGS.add('html')
nh3.ALLOWED_TAGS.add('div')
nh3.ALLOWED_TAGS.add('body')
print(nh3.clean(text,tags=nh3.ALLOWED_TAGS,strip_comments=False))
Output:
<title>HTML Tutorial</title>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
We don't want to trim the html or head or body tags. Is there any limitation to nh3 library which does not allow these tags?
Blocked on https://github.com/rust-ammonia/ammonia/issues/183