htmlpurifier
htmlpurifier copied to clipboard
Mathml
Hi,
My name is Xavier Ripoll, I'm a developer at MathType and colleague of @manuelwiris, who contacted you a couple of months ago asking about a MathML module.
As he stated, we are working on a MathML module of our own for HTML Purifier. Currently, the module has been completed and is ready for release. We share the concerns you raised regarding security so we did an exhaustive job implementing the specification. To back our work, we also created three suites of tests.
One of them is included within the mathml branch (the one intended to merge) and consists of the Mozilla MathML Torture Test.
The two others are the W3C MathML Test Suite and 140+ custom MathML snippets. These are included only in the testing branch because of their size. If you prefer, though, they can be included in the main branch as well.
All three suites of tests are passing in our local environment with all PHP versions. Nonetheless, in older PHP versions (<=5.5) the execution fails in Travis due to an out-of-memory exception we believe to be caused by the size of the schema.ser file. We'd appreciate any feedback from your side about what might be causing the error.
You can check the results of the mathml (https://travis-ci.org/wiris/htmlpurifier/builds/453971161) and testing (https://travis-ci.org/wiris/htmlpurifier/builds/453986321) branches on Travis.
Given all of this, we think HTML Purifier should benefit from the inclusion of the MathML module we propose.
Thank you, Xavier
P.S. We have also detected some other issues with the HTML Purifier package, is the GitHub tracker the appropriate place to report these?
P.S. We have also detected some other issues with the HTML Purifier package, is the GitHub tracker the appropriate place to report these?
Yes please.
I haven't looked at the patch yet, but we'll have to work out some solution to the Travis OOM problem. Maybe as simple as just bumping up the max memory size in PHP config.
This is a really good start. However, there is too heavy use of CDATA, especially on fields which are known to be XSS-able if you have arbitrary string data. The next step is to go through the CDATA fields and audit them. Use existing attribute definitions as much as possible, and otherwise clamp down on possible values.
I went and updated the most important fields that were previously CDATA, some new AttrDefs were added in order to do so.
OK. This may take some time to review because I need to audit every remaining occurrence of CDATA to confirm that there isn't security risk.
Hi @ezyang,
Hope you are well.
My name is Manuel, Plugin Manager at WIRIS.
I'm, writing to see if there are some prevision to include this update in a new version of HTML Purifier and to see if we can help in any way to accelerate the integration process.
Best,
Hi there, i am going through multiple sanitizer libs to use it with CKeditor MathType / ChemType and i found this thread, can anyone please tell me how to sanitize MathType tags properly? It's getting little bit confusing.
Just in case, if anyone is looking for sanitizer for MathMl or ChemType / MathType you can use this https://github.com/rohitcoder/html-sanitizer/ this is improved and upgraded version of https://github.com/tgalopin/html-sanitizer/