htmlpurifier
htmlpurifier copied to clipboard
Cache memory leak when using PHP 7.4.x
I am currently investigation a memory leak with PHP 7.4.x. I upgraded from 7.2.26 to 7.4.10 and subsequently htmlpurifier from 4.10 to 4.13 as 4.10 is not PHP 7.4 compatible. Since then I have a huge issue with memory leaks, in my application a couple dozen of calls can leak ~512MB. I am still investigating the root cause, but I hope somebody has an idea what might happen. I will try to strip down my application to the minimum code required for reproducing the issue.
Things I found out so far:
- PHP 7.4.x is affected, I tested 7.4.0 and 7.4.10. 7.3.22 and 7.2.26 do not have this issue
- HTML-Purifier 4.12 and 4.13 have this issue, 4.10 is not PHP 7.4 compatible and 4.11 either had this issue or was not 7.4. compatible, but I tested it.
- With Cache.DefinitionImpl set to null no memory leaking occurs
- The cache directory does not grow.
- Running php mem profiler shows that the unserialize statement cause the most memory allocations. I am running a lot of de/serialization and have no issues there.
- It always happens at the same location.
Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 20480 bytes) in /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/DefinitionCache/Serializer.php on line 73
...
45.9637 516876080 16. HTMLPurifier->purify(string(7), ???) /home/someproject/Classes/Util/HTMLSanitizer.php:164
45.9637 516879240 17. HTMLPurifier_Generator->__construct(class HTMLPurifier_HTML5Config, class HTMLPurifier_Context) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier.php:158
45.9637 516879240 18. HTMLPurifier_HTML5Config->getHTMLDefinition(???, ???) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/Generator.php:74
45.9637 516879240 19. HTMLPurifier_HTML5Config->getDefinition(string(4), false, false) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/Config.php:415
45.9637 516879240 20. HTMLPurifier_HTML5Config->getDefinition(string(4), true, true) /home/someproject/libs/htmlpurifier-html5-master/library/HTMLPurifier/HTML5Config.php:86
45.9637 516879240 21. HTMLPurifier_DefinitionCache_Decorator_Cleanup->get(class HTMLPurifier_HTML5Config) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/Config.php:579
45.9637 516879240 22. HTMLPurifier_DefinitionCache_Decorator_Cleanup->get(class HTMLPurifier_HTML5Config) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php:70
45.9637 516879240 23. HTMLPurifier_DefinitionCache_Serializer->get(class HTMLPurifier_HTML5Config) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/DefinitionCache/Decorator.php:81
45.9640 517026992 24. unserialize(string(132328)) /home/someproject/libs/htmlpurifier-4.12.0-lite/library/HTMLPurifier/DefinitionCache/Serializer.php:73
config:
"AutoFormat.AutoParagraph" => true,
"AutoFormat.Linkify" => true,
"AutoFormat.RemoveEmpty" => true,
"AutoFormat.RemoveSpansWithoutAttributes" => true,
"Core.RemoveProcessingInstructions" => true,
"URI.AllowedSchemes" => array (
'http' => true,
'https' => true,
'mailto' => true
),
"URI.DefaultScheme" => "https",
"Output.TidyFormat" => true,
"HTML.ForbiddenAttributes" => array("class", "@data-community-tooltip"),
"HTML.ForbiddenElements" => [ "iframe", "form", "button", "input", "body", "html", "frameset", "head", "meta", "script", "style" ],
"Attr.ForbiddenClasses" => array("bb_ul", "bb_tag"),
"Core.CollectErrors" => true,
"Cache.SerializerPath" => "/some/path"
More info will follow.
I have the same issue on 7.4, we send all emails through HTML purifier and normally the process stops at 2500 emails (then a few hundred MB memory). With $config->set('Cache.DefinitionImpl', null);
, memory consumption stays low (14 MB) but it is not as fast.
I'm in the process of switching to 7.4, and I was worried about your issue regarding memory consumption/leak. So I'm testing, and I cannot reproduce your problem.
I have 84 .eml files of HTML emails, 953 MB in total. The biggest file is 52 MB, the smallest 3 KB
4 testcases in total:
PHP 7.1 HTMLPurifier 4.9.3 PHP 7.1 HTMLPurifier 4.13.0 PHP 7.4 HTMLPurifier 4.9.3 PHP 7.4 HTMLPurifier 4.13.0
As you can see below, all 4 have roughly the same memory usage. PHP 7.4 uses 1% more memory, but is 10% faster. (The Deprecation notices in case 3 are expected, HTMLPurifier 4.9.3 is not compatible with PHP 7.4)
$ /usr/bin/php7.1 htmlpurify1.php old
PHP Version: 7.1.33-34+ubuntu18.04.1+deb.sury.org+1
HTMLPurifier Version: 4.9.3
Memory Usage: 195.29 MB
Memory Real Usage: 213.36 MB
Seconds: 40.261646032333
$ /usr/bin/php7.1 htmlpurify1.php new
PHP Version: 7.1.33-34+ubuntu18.04.1+deb.sury.org+1
HTMLPurifier Version: 4.13.0
Memory Usage: 195.35 MB
Memory Real Usage: 213.36 MB
Seconds: 41.45220208168
$ /usr/bin/php7.4 htmlpurify1.php old
Deprecated: Array and string offset access syntax with curly braces is deprecated in /htmlpurifier-4.9.3/library/HTMLPurifier/Encoder.php on line 162
Deprecated: Array and string offset access syntax with curly braces is deprecated in /htmlpurifier-4.9.3/library/HTMLPurifier/ChildDef/Custom.php on line 48
Deprecated: Array and string offset access syntax with curly braces is deprecated in /htmlpurifier-4.9.3/library/HTMLPurifier/TagTransform/Font.php on line 78
Deprecated: Array and string offset access syntax with curly braces is deprecated in /htmlpurifier-4.9.3/library/HTMLPurifier/TagTransform/Font.php on line 78
Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /htmlpurifier-4.9.3/library/HTMLPurifier.autoload.php on line 17
PHP Version: 7.4.21
HTMLPurifier Version: 4.9.3
Memory Usage: 196.16 MB
Memory Real Usage: 215.45 MB
Seconds: 35.772937059402
$ /usr/bin/php7.4 htmlpurify1.php new
PHP Version: 7.4.21
HTMLPurifier Version: 4.13.0
Memory Usage: 196.22 MB
Memory Real Usage: 215.45 MB
Seconds: 36.35814499855
If I just purify the largest 52 MB file, I get these numbers:
$ /usr/bin/php7.4 htmlpurify1.php new
PHP Version: 7.4.21
HTMLPurifier Version: 4.13.0
Memory Usage: 184.86 MB
Memory Real Usage: 186.09 MB
Seconds: 0.97783088684082
Purifying 953 MB instead of 52 MB is increasing the memory a bit, but not that much.
If I disable the cache $config->set('Cache.DefinitionImpl', null);
it does not change anything, the memory consumption and runtime is the same. If I enable the Cache, it generates .ser files. But at least in my case it does not bring any performance improvements...
@eazrael @jahrralf Can you provide testfiles, so I can reproduce your performance problems?
Sorry - I cannot provide test data.