edifact
edifact copied to clipboard
How to avoid Out of memory
I'm trying to parse a 30MB large (one line) Pricat file. I just took one of the examples, and went from there. But PHP is running very fast out of memory, any clue how to parse the file in "smaller" chunks without splitting it by hand?
`<?php ini_set('memory_limit', '2048M'); ini_set('max_execution_time', 3000); header('Content-Type: text/html; charset=utf-8');
require 'vendor/autoload.php'; use EDI\Parser; $file='bn_new.96a';
$c = new Parser();
$c->load($file);
echo $c->errors();
echo "
";
echo $c->get();
?>`
This is my code thus "far". Any help is appreciated.
Are there multiple messages (so multiple UNH) or is a single message (one UNH) with a lot of repetitions?
It's one UNH but a plethora of LIN elements. It's basically a big catalogue file.
Is possible do some optimisation:
- https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L238 replace with while($line = array_shift($file)) {
- https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L102 implement also array_shift() and result collect direct in $this->parsedfile
- "public function parse($file2)" change to "public function parse(&$file2)"
- public function loadString(&$string) { $this->rawSegments = $this->unwrap($string); return $this->parse($this->rawSegments); }
@uldisn Thx for your input.
I've got the first and third bullet point implemented.
But I am lost at second and last bullet point
public function loadString(&$string) { $this->rawSegments = $this->unwrap($string); return $this->parse($this->rawSegments); }
Is this a new function in Parse.php? Or do I need to edit another file?
https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L102 implement also array_shift() and result collect direct in $this->parsedfile
That's the line:
foreach ($file2 as $x => &$line) {
simply replace with
while($line = array_shift($file)) {
??? that doensn't seem right
while($line = array_shift($file2)) {
perhaps? but x is gone.
Any hints?
Did I write something wrong?
Sorry! I do not have time validate it, but main idea is use while($r = array_shift($a)) instead foreach ($a as $r) and avoiding duplicating data in several variables
@teflontoni could you share this file privately?
@sabas here you go: bn_new.zip it's just an article catalogue, so I think it's ok to share it right here.
@uldisn Okay, I'll give it a try, thanks for your response.
I tried last night increasing to 512 MB of allocated RAM and it started processing, but I haven't yet timed how much it lasts...
@teflontoni I pushed a commit which "solves" the out of memory problem with your file. I increased the allocated RAM to 1 GB and it terminated after less than 1 hour (still a bit too much I think :D).
@uldisn I used some of your ideas, but I noticed a potential problem on your while loop https://stackoverflow.com/questions/49212963/why-array-shift-on-an-array-with-an-empty-string-breaks-the-loop
... With 2GB of ram allocated the Parser completes almost immediately now....
A thing worth noting is that all lines of a file are held in memory twice in the Parser: once in the rawSegments
property and once in the parsedfile
one. It might be a good idea to unset the rawSegments
property when Parser::parse()
is executed to save memory. If you need the rawSegments
, you could retrieve them right after executing one of the load methods. Of course, then Parser::loadString()
would need to stop running the parse()
method, and you'd need to run it explicitly yourself.
@gaxweb we could do that marking it a breaking change and tagging a new release, I'm open to it!