nh3
nh3 copied to clipboard
Feature goals compared to Bleach (/ full ammonia API)?
Discussions are not enabled so opening it here, sorry 'bout it.
With the recent deprecation of bleach (mostly on grounds of html5lib being unmaintained), unless someone has the time to e.g. rebuild the html5lib API on top of an existing html5 parser and the maintainer of bleach decides to use that, ammonia/nh3 seems well positioned as a migration target (there's already one package which has done that visible from the linked Bleach PR).
One issue there is that nh3 currently provides rather limited tuning knobs compared to Ammonia and Bleach (not sure how the two relate as I have not looked yet), but the readme doesn't really say what your eventual goals would be on that front as maintainer. If you do aim to favor such support & migration, maybe an issue or even project (kanban) about full Ammonia support and / or Bleach features parity (if not API compatibility) could be a consideration?
An other possible issue (though more internal) is for exposing customisations which allow arbitrary callables (attribute_filter seems to be the only one currently): nh3 currently releases the GIL during cleanup, which wouldn't allow calling Python functions, and thus exposing a generic attribute_filter, I don't know whether Ammonia has parallelism built-in or how much you care about parallel cleaning (though I figure having two paths and only keeping the GIL if callbacks were actually provided would always be an option if a somewhat more annoying one).
Neither feature parity with Bleach nor full ammonia API is the goal for nh3 at the moment, but I'm happy to add more knobs for projects that want to migrate from bleach to nh3.
Others will likely opine but FWIW looking at our current use of bleach, Ammonia seems to support most of what we need (though some of it by hand-rolling through the ultra generic attibute_filter which may not be so useful), however nh3 does not currently expose it
attribute_filterobviously, this we would need to filter / cleanup@style(like Bleach'sCSSSanitizer, something which Ammonia does not currently seem to support directly (rust-ammonia/ammonia#179)- also the same for a finer / more flexible handling of
data:urls which is not currently in ammonia (rust-ammonia/ammonia#154) clean_content_tagswhich allows removing the tag and all of its content (the default whitelist removes the tag itself and "unwraps" its content I believe)
We also customise the serialization compared to the bleach default (which is just html5lib's as far as I can tell), but html5ever doesn't seem to have tuning knobs there (or at least not any which is relevant to what we configured) so there's definitely nothing you could do.
FWIW for the first two items nh3 might be able to provide bespoke whitelists from which it'd compose the relevant attribute_filter internally, but that may or may not be desirable (and for the first it would add a dependency on something like cssparser to implement a rust-level sanitizer).
Added attribute_filter in #11.
Added clean_content_tags in https://github.com/messense/nh3/pull/12
:heart:
Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here? One of our use-cases is to add target="_blank" to some anchor elements
Bleach offers the ability to pass
callbackswhich can modify/augment/remove attributes from each element. Is there a chance to get something similar here?
@camflan Added tag_attribute_values for this in v0.2.11.
See also https://github.com/messense/nh3/commit/a2ec808886a966b0f508e3c25254548ce219b935
Bleach offers the ability to pass
callbackswhich can modify/augment/remove attributes from each element. Is there a chance to get something similar here?@camflan Added
tag_attribute_valuesfor this in v0.2.11.
It's not entirely clear how to add target="_blank" to a only if href value is https://github.com for example.
And is possible to rename element, for example h1 -> h2 ?
It's not entirely clear how to add
target="_blank"toaonly ifhrefvalue ishttps://github.comfor example.And is possible to rename element, for example
h1->h2?
I don't think you can do either of those via Ammonia (and thus nh3), especially for the second request it's not a general-purpose HTML-rewriting device.
You can see all the operations Ammonia supports at ammonia::Builder, and nh3 handles a subset of those.
Your first request would I think be rust-ammonia/ammonia#163.
Bleach offers the ability to pass
callbackswhich can modify/augment/remove attributes from each element. Is there a chance to get something similar here?@camflan Added
tag_attribute_valuesfor this in v0.2.11.See also a2ec808
Would it be possible to allow a * entry to this map just like is possible in the attributes arg?
from docs:
attributes (dict[str, set[str]], optional) – Sets the HTML attributes that are allowed on specific tags, * key means the attributes are allowed on any tag.
I'm trying to sanitize data from CKEditor5 and multiple tags can for example contain style="text-align:right;". I cannot however allow all values of style, as that would make it vulnerable to xss
Would it be possible to allow a
*entry to this map just like is possible in theattributesarg? from docs:
nh3 is a thin layer over ammonia so it's limited to what ammonia provides.
For attributes, messense dispatches to generic_attributes or tag_attributes based on the * key rather than replicate the two parameters.
But there's no such generic_attribute_values, messense would have to create a custom attribute_filter which may then have to be reconciled with a user-provided attribute_filter, plus unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.
Unless you can get Ammonia to add a generic_attribute_values, I'd suggest just using attribute_filter directly.
For
attributes, messense dispatches togeneric_attributesortag_attributesbased on the*key rather than replicate the two parameters.
Ah, right i missed that in the rust docs.
unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.
Luckily performance is not an issue for me so i'll do it myself
I'd suggest just using
attribute_filterdirectly.
That is fair enough. I assume I'd implement that by checking if attribute == "style" and value within a custom whitelist?
Does this filter apply before or after cleaning?
Thank you for the quick reply
Edit: I've resorted to just implementing a loop to update the tag_attribute_values with a tag whitelist
for tag in tag_whitelist:
tag_attribute_values.update([
(tag, tag_attribute_values['*']),
])
One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the strip attribute from Bleach. It removes tags which are not in allowed list. This is not the same as clean_content_tags since that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.
One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the
stripattribute from Bleach. It removes tags which are not in allowed list. This is not the same asclean_content_tagssince that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.
Isn't that just tags? Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:
>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div><foo>xxx</foo></div>'
>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'}, strip=True)
'<div>xxx</div>'
>>> nh3.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div>xxx</div>'
clean_content_tags is used to remove not just the tag but the tag's contents as well, which is why it's opt-in, and additional to tags (you can't clean_content_tags a whitelisted tag).
Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:
That makes it unusable for usecases where you want to tread the string as plaintext where people may be writing stuff that looks like HTML (ie it has < and >) but isn't HTML... Being able to choose between stripping and escaping would be very much appreciated.
Being able to choose between stripping and escaping would be very much appreciated.
See https://github.com/rust-ammonia/ammonia/issues/145
Is there an equivalent for bleach.linkify()?
Is there an equivalent for
bleach.linkify()?
No.