HTMLPurifier for SVG Files?
This is probably more of a configuration issue than something that necessitates a code-change, but I felt it would be worth mentioning.
SVG images can contain embedded JavaScript code, since they're XML documents. e.g. https://hackerone.com/reports/148853
Has anyone had any luck using HTMLPurifier with SVG files? Is any significant code change needed to make it work?
You'll need to program each element as described in http://htmlpurifier.org/docs/enduser-customize.html
Some of it will be straightforward, but some of it might be quite difficult. SVG is quite a large spec.
Yeah, I'll probably tackle this soon. Would you prefer:
- If I just wrote the configuration object as a separate dependency, or
- If I submitted a pull request with it?
In principle I would take a pull request but I need to emphasize how much work it will be to implement this to the standards of the project. So if you are just looking to get something working for yourself probably ship it as a separate dep.
Just wanting to check if you had any luck with this @paragonie-scott ?
Hello, it seems there is a problem because the html5 standard (which is pretty old now and the massive first usage) totally allows to embed SVG inline, like the <svg> tag (and all children) directly inside the html5 document.
So when we launch htmlpurifier on this kind of document… it deletes all SVG we inserted inline !
How to allow SVG, as it has been near 15 years that it's perfectly legit to do that (I mean it's not a marginal case at all, it's widely used).
And yes I've read the sentence :
There are also a number of other XML languages out there that can be embedded in HTML documents: two of the most popular are MathML and SVG, and I frequently get requests to implement these. But they are expansive, comprehensive specifications, and it would take far too long to implement them correctly (most systems I've seen go as far as whitelisting tags and no further; come on, what about nesting!)
Do we have to understand in that warning, that htmlpurifier cannot be use in real life on pages generated in >2015 ? (as any modern page can contains svg)
Is there a way to totally ignore svg tags and all children (and find another way to secure that part), in order to be able to use htmlpurifier for the rest ?
I mean, you could always regex the svg blocks out and run them through something else lol
CSS code is extracted and sent to CSSTidy, maybe it makes sense to do something similar for SVG? An "SVG sub-parser" or so?
You'll need to program each element as described in http://htmlpurifier.org/docs/enduser-customize.html
Some of it will be straightforward, but some of it might be quite difficult. SVG is quite a large spec.
Since SVG is a large spec, it can take some time to solve for your use case, so I hope this help with setup.
Also easy to add to if required.
`
$config=HTMLPurifier_Config::createDefault();
$def=$config->getHTMLDefinition(true);
$svg=$def->addElement('svg','Block','Flow','Common',array ('version'=>'CDATA','id'=>'CDATA','xmlns'=>'CDATA','width' =>'CDATA','height'=>'CDATA','xmlns:xlink'=>'CDATA','x' =>'CDATA','y' =>'CDATA','viewBox'=>'CDATA','enable-background'=>'CDATA','xml:space'=>'CDATA'));
$svg->excludes = array('svg' => true);
$path=$def->addElement('path','Block','Flow','Common',array('fill'=>'CDATA','d'=>'CDATA'));
$g=$def->addElement('g','Block','Flow','Common',array('fill'=>'CDATA','stroke'=>'CDATA','stroke-width'=>'CDATA'));
$polyline=$def->addElement('polyline','Block','Flow','Common',array('points'=>'CDATA'));
$rect=$def->addElement('rect','Block','Flow','Common',array('x'=>'CDATA','y'=>'CDATA','width'=>'CDATA','height'=>'CDATA'));
$purifier=new HTMLPurifier($config);`