processwire-requests
processwire-requests copied to clipboard
Allow HTML Purifier to be configurable
Short description of the enhancement
Related forum topic: https://processwire.com/talk/topic/19852-is-html-purifier-configurable-in-pw/
I need to allow for specific data-* attributes in a CKEditor field but regardless of ACF settings these are stripped out of the field value by HTML Purifier. I would like to keep HTML Purifier enabled, and I am able to configure HTML Purifier to allow the data attribute by editing MarkupHTMLPurifier::init()
:
public function init() {
$this->settings->set('Cache.SerializerPath', $this->getCachePath());
$this->settings->set('Attr.AllowedRel', array('nofollow'));
$this->settings->set('HTML.DefinitionID', 'html5-definitions');
$this->settings->set('HTML.DefinitionRev', 1);
if($def = $this->settings->maybeGetRawHTMLDefinition()) {
$def->addElement('figure', 'Block', 'Optional: (figcaption, Flow) | (Flow, figcaption) | Flow', 'Common');
$def->addElement('figcaption', 'Inline', 'Flow', 'Common');
// Added line below to allow data-ext attribute on 'a' elements
$def->addAttribute('a', 'data-ext', 'Text');
}
}
But I'd like to be able set the HTML Purifier configuration in a way that doesn't require directly changing core code.
Seeing as data-* attributes are frequently used/needed these days, could MarkupHTMLPurifier
be updated in some way (maybe a new hookable method that is passed the $def
object?) that allows for custom configuration like this?
And also, perhaps there is some way that MarkupHTMLPurifier
can be changed to allow all data-* attributes on all elements by default as I don't see why data attributes should be considered especially "impure".
P.S. Because changes to the HTML Purifier configuration only take effect when the HTML Purifier cache is cleared, perhaps a helper method could be added to MarkupHTMLPurifier
to simplify the cache clearing? E.g. MarkupHTMLPurifier::clearCache()
We really need this - there is no one-size-fits-all for this stuff.
In my case I want to enable AutoFormat.RemoveEmpty
Seems like it should be a really easy enhancement to have a config setting textarea that we can load up with setting / value pairs (probably colon separated) and have the module use these.
What an easy PR this would be :)
I would love to see this configurable in the Admin or via hook, too. ATM allowed content configured via ACF is being stripped out by the purifier.
@Toutouwai I've added the things you requested:
Seeing as data-* attributes are frequently used/needed these days, could MarkupHTMLPurifier be updated in some way (maybe a new hookable method that is passed the $def object?) that allows for custom configuration like this?
There is now an initConfig() hookable method that is passed the $def object, and lets you modify it. There's also a getDef() method in case you want to retrieve it separately.
And also, perhaps there is some way that MarkupHTMLPurifier can be changed to allow all data-* attributes on all elements by default as I don't see why data attributes should be considered especially "impure".
I'm not sure how to do that, though also thinking the default configuration should be be anonymous HTML safe, which means not allow "all" of anything. It's not so much the data attributes but what's in them, since pretty much anything goes.
P.S. Because changes to the HTML Purifier configuration only take effect when the HTML Purifier cache is cleared, perhaps a helper method could be added to MarkupHTMLPurifier to simplify the cache clearing? E.g. MarkupHTMLPurifier::clearCache()
I've added a clearCache() method as well.
Thanks @ryancramerdesign!
Not sure if/when this happens automatically, but the could the MarkupHTMLPurifier documentation be updated to show these new methods?
In particular, I think people are likely to be confused by the config caching so it would be good to have a note explaining that the cache will need to be cleared with clearCache()
each time the config is changed within an initConfig()
hook.
Great addition, @ryancramerdesign ! Is there an easy way to allow configurations be made in admin as @adrianbj suggested? So we can make them as easy and in the same place as ACF configurations?
Not sure if this should go in its own request, but would it be possible to update purifier definitions to 'modern' HTML5?
In my use case I needed users to be allowed to add <iframe>
's with the allowfullscreen
attribute, but it kept being filtered out. But I really needed the purifer to keep things as safe as possible, So I've played around with https://github.com/xemlock/htmlpurifier-html5 for a bit, and it works fine for as far as I can tell.
I can now simply add a $purifier->set("HTML.IframeAllowFullscreen", "true");
to the configuration, and get nice clean HTML5.
@ryancramerdesign @Toutouwai Where do I have to put the hook? Into site/admin.php? I tried to add a simple section element to the allowed elements of the purifier, but it gets stripped everytime I save the page with the CKEditor. I added the following code to site/admin.php:
$wire->addHook('MarkupHTMLPurifier::initSettings', function ($event) {
$def = $event->arguments(1);
$def->addElement('section');
});
$purifier =$modules->get('MarkupHTMLPurifier');
$purifier->clearCache();
bd($purifier->getDef());
require($config->paths->adminTemplates . 'controller.php');
is that correct? When do I have to clean the cache? Before setting the Hook or afterwards? I tried clearing the cache beforehand, but to no avail. My section elements always get stripped out.
@ryancramerdesign, there is a mistake in the PhpDoc comments for the initConfig
method. The example hook there shows MarkupHTMLPurifier::initSettings
but it should be MarkupHTMLPurifier::initConfig
.
Also, the online documentation for this method looks like it isn't being generated correctly: https://processwire.com/api/ref/markup-h-t-m-l-purifier/
@jmartsch, as per my comment to Ryan, the hookable method is initConfig
and not initSettings
.
The addElement() method requires at least 4 arguments but you are only supplying one. You should refer to the HTMLPurifier documentation and source code (PhpDoc comments for HTMLPurifier_HTMLModule::addElement
) but an example hook is below. You'll need to clear the HTMLPurifier config cache once every time you make a change to the HTMLPurifier config. You could use the Tracy console to do this:
$purifier = new MarkupHTMLPurifier();
$purifier->clearCache();
Example hook in /site/ready.php although it would also work in /site/templates/admin.php
$wire->addHookAfter('MarkupHTMLPurifier::initConfig', function(HookEvent $event) {
$def = $event->arguments(1);
$def->addElement('section', 'Flow', 'Flow', 'Common');
});
While you are testing this make sure you have ACF turned off in your CKEditor field settings because that will also strip out <section>
elements. You could configure it to allow <section>
but that's a separate thing from what is being discussed here.
Thank you very much for your insights Robin
@ryancramerdesign could you please correct the documentation?