edifact icon indicating copy to clipboard operation
edifact copied to clipboard

How to avoid Out of memory

Open teflontoni opened this issue 6 years ago • 10 comments

I'm trying to parse a 30MB large (one line) Pricat file. I just took one of the examples, and went from there. But PHP is running very fast out of memory, any clue how to parse the file in "smaller" chunks without splitting it by hand?

`<?php ini_set('memory_limit', '2048M'); ini_set('max_execution_time', 3000); header('Content-Type: text/html; charset=utf-8');

require 'vendor/autoload.php'; use EDI\Parser; $file='bn_new.96a';

$c = new Parser(); $c->load($file); echo $c->errors(); echo "
"; echo $c->get();

?>`

This is my code thus "far". Any help is appreciated.

teflontoni avatar Jan 31 '18 10:01 teflontoni

Are there multiple messages (so multiple UNH) or is a single message (one UNH) with a lot of repetitions?

sabas avatar Jan 31 '18 10:01 sabas

It's one UNH but a plethora of LIN elements. It's basically a big catalogue file.

teflontoni avatar Jan 31 '18 10:01 teflontoni

Is possible do some optimisation:

  • https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L238 replace with while($line = array_shift($file)) {
  • https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L102 implement also array_shift() and result collect direct in $this->parsedfile
  • "public function parse($file2)" change to "public function parse(&$file2)"
  • public function loadString(&$string) { $this->rawSegments = $this->unwrap($string); return $this->parse($this->rawSegments); }

uldisn avatar Jan 31 '18 11:01 uldisn

@uldisn Thx for your input.

I've got the first and third bullet point implemented.

But I am lost at second and last bullet point

public function loadString(&$string) { $this->rawSegments = $this->unwrap($string); return $this->parse($this->rawSegments); } Is this a new function in Parse.php? Or do I need to edit another file?

https://github.com/php-edifact/edifact/blob/master/src/EDI/Parser.php#L102 implement also array_shift() and result collect direct in $this->parsedfile

That's the line:

foreach ($file2 as $x => &$line) { simply replace with

while($line = array_shift($file)) { ??? that doensn't seem right while($line = array_shift($file2)) { perhaps? but x is gone.

Any hints?

teflontoni avatar Jan 31 '18 11:01 teflontoni

Did I write something wrong?

teflontoni avatar Feb 22 '18 16:02 teflontoni

Sorry! I do not have time validate it, but main idea is use while($r = array_shift($a)) instead foreach ($a as $r) and avoiding duplicating data in several variables

uldisn avatar Feb 22 '18 18:02 uldisn

@teflontoni could you share this file privately?

sabas avatar Feb 22 '18 20:02 sabas

@sabas here you go: bn_new.zip it's just an article catalogue, so I think it's ok to share it right here.

@uldisn Okay, I'll give it a try, thanks for your response.

teflontoni avatar Feb 27 '18 06:02 teflontoni

I tried last night increasing to 512 MB of allocated RAM and it started processing, but I haven't yet timed how much it lasts...

sabas avatar Feb 28 '18 14:02 sabas

@teflontoni I pushed a commit which "solves" the out of memory problem with your file. I increased the allocated RAM to 1 GB and it terminated after less than 1 hour (still a bit too much I think :D).

@uldisn I used some of your ideas, but I noticed a potential problem on your while loop https://stackoverflow.com/questions/49212963/why-array-shift-on-an-array-with-an-empty-string-breaks-the-loop

sabas avatar Mar 10 '18 20:03 sabas

... With 2GB of ram allocated the Parser completes almost immediately now....

sabas avatar Aug 06 '23 17:08 sabas

A thing worth noting is that all lines of a file are held in memory twice in the Parser: once in the rawSegments property and once in the parsedfile one. It might be a good idea to unset the rawSegments property when Parser::parse() is executed to save memory. If you need the rawSegments, you could retrieve them right after executing one of the load methods. Of course, then Parser::loadString() would need to stop running the parse() method, and you'd need to run it explicitly yourself.

gaxweb avatar Aug 16 '23 08:08 gaxweb

@gaxweb we could do that marking it a breaking change and tagging a new release, I'm open to it!

sabas avatar Aug 17 '23 12:08 sabas