simplexlsx icon indicating copy to clipboard operation
simplexlsx copied to clipboard

Out of memory

Open hershkoy opened this issue 4 years ago • 16 comments

Very weird error, happening only in browser, not in cli using v0.8.23

[Wed Jul 07 14:25:39.585044 2021] [:error] [pid 2736:tid 2684] [client 172.18.15.20:57062] PHP Fatal error:  Out of memory (allocated 58982400) (tried to allocate 34821771 bytes) in .....\\vendor\\shuchkin\\simplexlsx\\src\\SimpleXLSX.php on line 537

I checked php memory_limit it is 128M both for php cli and php browser.

Computer has 16Gb ram, and is 25% used. I downloaded the xlsx, run it on my computer and it worked even in browser. Xlsx file is 2.5Mb, has about 45000 lines and only one sheet.

As far as I can understand, this is the line where it happens:

$entry_xml = preg_replace( '/xmlns[^=]*="[^"]*"/i', '', $entry_xml ); // remove namespaces

I printed the variable to a file before it crashes, and it generated a 17Mb file, all in one line, so the preg_replace above is trying to operate on that large string and crashes.

I don't know why it works on my computer, but not on the other one, and why in the other one it works in cli and not in browser.

Unfortunately xlsx file contains sensitive info, I can't share it, and right now I don't have a reproducible file to share.

hershkoy avatar Jul 07 '21 15:07 hershkoy

I've tested "long string" vs "array" version of cleanup xml code, there 3x memory overhead in array version

shuchkin avatar Jul 07 '21 23:07 shuchkin

For what I needed to do, I ended up asking the client to use CSV file instead of XLSX which solves my immediate issue. So I don't have to fix this issue. I just wonder what is the cause of the issue. On my computer, where it didn't crash, I watched task manager when the code is running, and barely saw an increase in ram usage. In addition, both my computer and the server have 16Gb ram, but my computer is running +90% ram usage most of the time, while the server is at 25%. I think that the issue is bug in preg_replace, maybe something to do with php version (7.2 my computer, 5.6 server)

I don't know if you want to debug it further. We can close this issue if you wish

hershkoy avatar Jul 08 '21 06:07 hershkoy

same problem here: Fatal error: Out of memory (allocated 183738368) (tried to allocate 36137192 bytes) in [hiddden]/vendor/shuchkin/simplexlsx/src/SimpleXLSX.php on line 538 File is xlsm format and it is around 20MB of size(4-5 sheets of data). php 7.4 running with 2GB memory limit

nikolayganovski1 avatar Oct 16 '21 06:10 nikolayganovski1

check your memory_limit there not 2GB

shuchkin avatar Oct 16 '21 11:10 shuchkin

@shuchkin it is correct, memory_limit in phpinfo() shows 2G

nikolayganovski1 avatar Oct 16 '21 11:10 nikolayganovski1

Okay, after more testing, I've managed to see that on 39000 rows it crashes, and when I cut 5000 rows, now with 34000 rows it works fine. Is there some way to chunk-parse the rows or something?

nikolayganovski1 avatar Oct 16 '21 14:10 nikolayganovski1

I think after preg_replace garbage collector is not called

try this version https://github.com/shuchkin/simplexlsx/blob/master/src/SimpleXLSX.php

shuchkin avatar Oct 16 '21 16:10 shuchkin

nope, still the same errors: <b>Fatal error</b>: Out of memory (allocated 110964736) (tried to allocate 20480 bytes) in <b>[hidden]/src/SimpleXLSX.php</b> on line 650 <b>Fatal error</b>: Out of memory (allocated 110964736) (tried to allocate 20480 bytes) in <b>[hidden]/wp-includes/functions.php</b> on line 4068 the second error is WP trying to fill up $errors[] variable. looks like this row is where it breaks: $rows[ $curR ][ $curC ] = $this->value( $c ); also, printed the peak memory usage(before my loop foreach ($xlsx->rows(0) as $value) { ) :

not real: 205.25686645508 MiB
real: 187.19140625 MiB

nikolayganovski1 avatar Oct 16 '21 17:10 nikolayganovski1

@shuchkin any ideas?

nikolayganovski1 avatar Oct 20 '21 14:10 nikolayganovski1

try 1.0.12 foreach( $xlsx->readRows() as $r ) { // } readRows() & readRowsEx() return Generator to read huge files in iteration mode

shuchkin avatar Feb 21 '22 00:02 shuchkin

Apologies for chiming in late but I'm experiencing the same issue and have done for a while now. From what I've researched the issue is how excel determines the final row of the spreadsheet. Excel seems to add additional blank rows without logic or reason.

If you open your spreadsheet that fails, highlight the worksheet (CRTL+A or CMD+A) you'll notice that the select all continues way passed any data. Look at the scroll bar for reference as you scroll!!! If you open the file in BBE edit or similar you can see thousands of empty rows in the content.

When you open the spreadsheet for processing the code throws a fatal exception at:

Line 608 - getEntryData - $entry['data'] = gzinflate($entry['data']);

Using the attached spreadsheet as an example, strlen($entry['data']) hits 18174506! So when you try to inflate the content the server runs out of memory (128MB). FYI spreadsheet size is clearly not 18.2MB from the small amount of data in it.

The workaround I've used that works 100% of the time is to manually highlight the data cells and copy to a brand new spreadsheet.

The only way I can think of fixing this is in code is to pre-parse the spreadsheet, reading in a binary stream of bytes removing the empty rows before loading the file into memory...?

See: https://stackoverflow.com/questions/11265914/how-can-i-extract-or-uncompress-gzip-file-using-php

example-out-of-memory.xlsx

fordero21 avatar Mar 11 '22 13:03 fordero21

Hi! I have a similar problem on server: "Allowed memory size of 134217728 bytes exhausted (tried to allocate 67510970 bytes) at vendor/shuchkin/simplexlsx/src/SimpleXLSX.php: 608". Locally everything is fine, but local memory is not limited. 123

Vladimir1247 avatar Jul 13 '22 18:07 Vladimir1247

memory_limit

shuchkin avatar Jul 14 '22 04:07 shuchkin

VDS/VPS. 128 Mbyte - is the entire memory. [upd] I was able to get an additional 128 MB of memory, as a result, another problem with insufficient memory came out: 123

Vladimir1247 avatar Jul 14 '22 06:07 Vladimir1247

try comment this line, but results can be unpredictable

shuchkin avatar Jul 14 '22 09:07 shuchkin

Of course, I already use readRows(), because the files can be very large (up to ~40 MB) - all the results are with it :/ [upd] Is it possible to parse not the whole book, but to get the number of sheets and retrieve them separately?

Vladimir1247 avatar Jul 14 '22 09:07 Vladimir1247