phpdoc-parser Reduce memory requirements

Right now the parser has very large memory requirements. I had to up my local install's available memory to run the parsing for core. 2GB on the VM and 1GB for PHP. This has some headroom, but it's regularly taking around 400MB to parse core initially and 320MB on an subsequent update run. The import script can run hundreds of thousands of queries, a simply put, PHP isn't meant to run for long periods of time so it leaks.

I would love to see some work done to audit where the memory leaks are and shore the worst ones up. I did a little bit of exploring and was able to determine that parsing should never be done with the SAVEQUERIES constant set to true. That should have been obvious, but I have it on for all my dev sites and I didn't think about it saving queries eating up memory during the cli call. Turning this off alone halves the memory requirement (~800MB before turning it off).

Nothing else I have explored has helped much, and some of it even increased the requirements. It'll take some digging because memory tools in PHP are rudimentary at best.

Dec 29 '15 23:12 lkwdwrd

From looking into this earlier I think it comes down to having a representation of all parsed content in a giant object that is then passed to a function that imports it into WordPress.

I think the only meaningful way to reduce memory footprint is to stop doing that.

Dec 30 '15 09:12 atimmer

It definitely starts that way. Then usage double or more during the import itself.

On Dec 30, 2015, at 3:22 AM, Anton Timmermans [email protected] wrote:

From looking into this earlier I think it comes down to having a representation of all parsed content in a giant object that is then passed to a function that imports it into WordPress.

— Reply to this email directly or view it on GitHub.

Dec 30 '15 13:12 lkwdwrd

What if we parsed and stored the data for each file separately? That way we'd only need to ever have one file in memory at a time. We could then import the files one at a time too. Also, it would perhaps make it easier for us to allow partial parsing in the future, only reparsing the files that have changed since the last time, for example.

So, instead of one giant JSON file, we'd have a bunch of smaller ones in a directory.

Dec 30 '15 14:12 JDGrimes

Or we could keep one file but stream to/from it instead of pulling the whole thing into memory at once (sort of like is being done with the WordPress Importer plugin).

Dec 30 '15 14:12 JDGrimes

Great ideas! This could definitely go hand in hand with making the import process a lot more modular as well. Right now it's kind of an importing god object with a massive god function inside it.

Doing the import in bits would allow us to do some cleanup even in the create workflow as well between each file. We'll likely still need to work out why we're leaking so much memory on the WP side of things, I don't think that's anything the importer is doing, but it's a good start.

Dec 30 '15 20:12 lkwdwrd

https://blackfire.io/ might be helpful here.

Jan 30 '16 17:01 JDGrimes

phpdoc-parser phpdoc-parser copied to clipboard

Reduce memory requirements

phpdoc-parser
phpdoc-parser copied to clipboard