nh3 icon indicating copy to clipboard operation
nh3 copied to clipboard

Feature goals compared to Bleach (/ full ammonia API)?

Open xmo-odoo opened this issue 2 years ago • 18 comments

Discussions are not enabled so opening it here, sorry 'bout it.

With the recent deprecation of bleach (mostly on grounds of html5lib being unmaintained), unless someone has the time to e.g. rebuild the html5lib API on top of an existing html5 parser and the maintainer of bleach decides to use that, ammonia/nh3 seems well positioned as a migration target (there's already one package which has done that visible from the linked Bleach PR).

One issue there is that nh3 currently provides rather limited tuning knobs compared to Ammonia and Bleach (not sure how the two relate as I have not looked yet), but the readme doesn't really say what your eventual goals would be on that front as maintainer. If you do aim to favor such support & migration, maybe an issue or even project (kanban) about full Ammonia support and / or Bleach features parity (if not API compatibility) could be a consideration?

An other possible issue (though more internal) is for exposing customisations which allow arbitrary callables (attribute_filter seems to be the only one currently): nh3 currently releases the GIL during cleanup, which wouldn't allow calling Python functions, and thus exposing a generic attribute_filter, I don't know whether Ammonia has parallelism built-in or how much you care about parallel cleaning (though I figure having two paths and only keeping the GIL if callbacks were actually provided would always be an option if a somewhat more annoying one).

xmo-odoo avatar Jan 27 '23 12:01 xmo-odoo

Neither feature parity with Bleach nor full ammonia API is the goal for nh3 at the moment, but I'm happy to add more knobs for projects that want to migrate from bleach to nh3.

messense avatar Jan 27 '23 13:01 messense

Others will likely opine but FWIW looking at our current use of bleach, Ammonia seems to support most of what we need (though some of it by hand-rolling through the ultra generic attibute_filter which may not be so useful), however nh3 does not currently expose it

  • attribute_filter obviously, this we would need to filter / cleanup @style (like Bleach's CSSSanitizer, something which Ammonia does not currently seem to support directly (rust-ammonia/ammonia#179)
  • also the same for a finer / more flexible handling of data: urls which is not currently in ammonia (rust-ammonia/ammonia#154)
  • clean_content_tags which allows removing the tag and all of its content (the default whitelist removes the tag itself and "unwraps" its content I believe)

We also customise the serialization compared to the bleach default (which is just html5lib's as far as I can tell), but html5ever doesn't seem to have tuning knobs there (or at least not any which is relevant to what we configured) so there's definitely nothing you could do.

FWIW for the first two items nh3 might be able to provide bespoke whitelists from which it'd compose the relevant attribute_filter internally, but that may or may not be desirable (and for the first it would add a dependency on something like cssparser to implement a rust-level sanitizer).

xmo-odoo avatar Jan 27 '23 14:01 xmo-odoo

Added attribute_filter in #11.

messense avatar Jan 27 '23 15:01 messense

Added clean_content_tags in https://github.com/messense/nh3/pull/12

messense avatar Jan 28 '23 02:01 messense

:heart:

xmo-odoo avatar Jan 30 '23 06:01 xmo-odoo

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here? One of our use-cases is to add target="_blank" to some anchor elements

camflan avatar Apr 19 '23 20:04 camflan

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

See also https://github.com/messense/nh3/commit/a2ec808886a966b0f508e3c25254548ce219b935

messense avatar Apr 22 '23 02:04 messense

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

It's not entirely clear how to add target="_blank" to a only if href value is https://github.com for example.

And is possible to rename element, for example h1 -> h2 ?

MikeVL avatar Jun 07 '23 20:06 MikeVL

It's not entirely clear how to add target="_blank" to a only if href value is https://github.com for example.

And is possible to rename element, for example h1 -> h2 ?

I don't think you can do either of those via Ammonia (and thus nh3), especially for the second request it's not a general-purpose HTML-rewriting device.

You can see all the operations Ammonia supports at ammonia::Builder, and nh3 handles a subset of those.

Your first request would I think be rust-ammonia/ammonia#163.

xmo-odoo avatar Jun 08 '23 05:06 xmo-odoo

Bleach offers the ability to pass callbacks which can modify/augment/remove attributes from each element. Is there a chance to get something similar here?

@camflan Added tag_attribute_values for this in v0.2.11.

See also a2ec808

Would it be possible to allow a * entry to this map just like is possible in the attributes arg? from docs:

attributes (dict[str, set[str]], optional) – Sets the HTML attributes that are allowed on specific tags, * key means the attributes are allowed on any tag.

I'm trying to sanitize data from CKEditor5 and multiple tags can for example contain style="text-align:right;". I cannot however allow all values of style, as that would make it vulnerable to xss

jasperfirecai2 avatar Jun 14 '23 12:06 jasperfirecai2

Would it be possible to allow a * entry to this map just like is possible in the attributes arg? from docs:

nh3 is a thin layer over ammonia so it's limited to what ammonia provides.

For attributes, messense dispatches to generic_attributes or tag_attributes based on the * key rather than replicate the two parameters.

But there's no such generic_attribute_values, messense would have to create a custom attribute_filter which may then have to be reconciled with a user-provided attribute_filter, plus unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.

Unless you can get Ammonia to add a generic_attribute_values, I'd suggest just using attribute_filter directly.

xmo-odoo avatar Jun 14 '23 12:06 xmo-odoo

For attributes, messense dispatches to generic_attributes or tag_attributes based on the * key rather than replicate the two parameters.

Ah, right i missed that in the rust docs.

unexpectedly changing the concurrency caracteristics of the call, and possibly having different performance caracteristics. Maybe not something to hide from the user.

Luckily performance is not an issue for me so i'll do it myself

I'd suggest just using attribute_filter directly.

That is fair enough. I assume I'd implement that by checking if attribute == "style" and value within a custom whitelist?

Does this filter apply before or after cleaning?

Thank you for the quick reply

Edit: I've resorted to just implementing a loop to update the tag_attribute_values with a tag whitelist

    for tag in tag_whitelist:
        tag_attribute_values.update([
            (tag, tag_attribute_values['*']),
        ])

jasperfirecai2 avatar Jun 14 '23 12:06 jasperfirecai2

One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the strip attribute from Bleach. It removes tags which are not in allowed list. This is not the same as clean_content_tags since that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.

frwickst avatar Aug 14 '23 07:08 frwickst

One thing from bleach that I'm missing (or maybe I'm just missing it in the docs) is the strip attribute from Bleach. It removes tags which are not in allowed list. This is not the same as clean_content_tags since that is an explicit list of the tags to remove, I'm looking for something that would just remove all tags that are not allowed.

Isn't that just tags? Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:

>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div>&lt;foo&gt;xxx&lt;/foo&gt;</div>'
>>> bleach.clean('<div><foo>xxx</foo></div>', tags={'div'}, strip=True)
'<div>xxx</div>'
>>> nh3.clean('<div><foo>xxx</foo></div>', tags={'div'})
'<div>xxx</div>'

clean_content_tags is used to remove not just the tag but the tag's contents as well, which is why it's opt-in, and additional to tags (you can't clean_content_tags a whitelisted tag).

xmo-odoo avatar Aug 14 '23 08:08 xmo-odoo

Unlike bleach, ammonia always just removes the blacklisted tags, it doesn't escape them:

That makes it unusable for usecases where you want to tread the string as plaintext where people may be writing stuff that looks like HTML (ie it has < and >) but isn't HTML... Being able to choose between stripping and escaping would be very much appreciated.

ThiefMaster avatar Aug 14 '23 08:08 ThiefMaster

Being able to choose between stripping and escaping would be very much appreciated.

See https://github.com/rust-ammonia/ammonia/issues/145

messense avatar Aug 14 '23 08:08 messense

Is there an equivalent for bleach.linkify()?

mattiaverga avatar Oct 09 '23 13:10 mattiaverga

Is there an equivalent for bleach.linkify()?

No.

xmo-odoo avatar Oct 10 '23 05:10 xmo-odoo