htmlpurifier icon indicating copy to clipboard operation
htmlpurifier copied to clipboard

Minor whitespace differences in PHP7.3

Open PatrickRose opened this issue 4 years ago • 4 comments

We're currently updating our app to PHP 7.3, and we're finding some unit test failures that don't occur on PHP 7.1. It's reproducible just using this script

<?php

require_once 'vendor/autoload.php';

$sHTML = '<p>Paragraph 1</p><textarea></textarea> <p>Paragraph 2</p>';

$oPurifier = new HTMLPurifier(); // version 4.11.0

echo $oPurifier->purify($sHTML);

On PHP 7.1, we get <p>Paragraph 1</p><p>Paragraph 2</p>. On PHP 7.3 we get <p>Paragraph 1</p> <p>Paragraph 2</p>.

There seems to be various cases where whitespace is added where it wasn't before:

'">><marquee><img src=x onerror=confirm(1)></marquee>"></plaintext\></|\><plaintext/onmouseover=prompt(1)>
<script>prompt(1)</script>@gmail.com<isindex formaction=javascript:alert(/XSS/) type=submit>'-->"></script>
<script>alert(document.cookie)</script>">
<img/id="confirm&lpar;1)"/alt="/"src="/"onerror=eval(id)>'">
<img src="http://www.shellypalmer.com/wp-content/images/2015/07/hacked-compressor.jpg">

On PHP7.3 that adds a newline before the @gmail which isn't there in PHP 7.1

<HEAD><META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=UTF-8"> </HEAD>+ADw-SCRIPT+AD4-alert(\'XSS\');+ADw-/SCRIPT+AD4- adds a space before the beginning of the output (though the HEAD etc is stripped)

I don't think this is an issue because whitespace doesn't normally mean anything, but I don't know whether it would signify some other weirdness that people need to be aware of elsewhere?

PatrickRose avatar Nov 04 '19 10:11 PatrickRose

This is probably related to the version of libxml that is being bundled with PHP. If someone wants to read the libxml changelog that would be very helpful.

ezyang avatar Nov 04 '19 15:11 ezyang

I can confirm this issue.

Before <br /> <br /> became <br><br>, now it becomes <br> <br> , thus with an additional space after the tag. (Thus actually correct.)

Unfortunately libxml2 is quite a mess, the official site is not updated at all and there is no updated changelog so one would need to check the difference between the latest versions.

For us the previous version was 2.9.4, now we are on 2.9.9. There are quite a few changes here ...

mbrodala avatar Feb 12 '20 09:02 mbrodala

Well, I would certainly agree the new behavior is "more correct". So hopefully there is nothing else to worry about; just update your unit tests and carry on...

ezyang avatar Feb 22 '20 17:02 ezyang

Sure it's not a dealbreaker but still unexpected. But as it seems nothing this package can do about.

mbrodala avatar Feb 24 '20 07:02 mbrodala