fast-excel
fast-excel copied to clipboard
Is that normal to read 730k rows for 35minutes+
I am using it as;
$excel = (new FastExcel)->import($data["file"]);
And it gets 35+ minutes (it didnt finish, i closed) with 8gb ram.
Is this time normal? Or anything to make optimization?
Unfortunately, it could be considered normal. TBH, this lib could be super optimized to export millions of rows (https://dev.to/rap2hpoutre/export-10m-rows-in-xlsx-with-laravel-without-memory-issues-6bk), but importing has many rooms for improvements! Currently, each row is read from the file sequentially, then returned to a variable that you can use (the big collection). If you want to preserve memory you could avoid returning the variable and directly writing your entries in database:
(new FastExcel)->import('file.xlsx', function ($line) {
return User::create([
'name' => $line['Name'],
'email' => $line['Email']
]);
});
Still it will not reduce the import time. And maybe you could have memory issues too.
I guess we could find a way to import big files and preserving memory but I have no idea right now. Maybe you could use Spout (https://opensource.box.com/spout/) which is the lib that is used under the hood to make FastExcel work.
After a lot of research i've found that parameter in Spout\Reader\XLSX\Helper\SharedStringsCaching\FileBasedStrategy.php $maxNumStringsPerTempFile default is limited to 10k rows, and reading of 23k+ rows each 8 sheets xlsx file is get slower after magic row number 1663 on any worksheet. So my xlsx file reading speed drop down from 0.0006 seconds at row 1662 to 0.025+ seconds after row 1663, so 1st sheet readed row by row by my generator for 800 seconds. Changing this parameter to 200000+ number of strings per temp file resolved issue, and 1st sheet now read in 20 seconds. Maybe should add as configurable parameter to fastexcel ? or note about that in readme
@rap2hpoutre Do you need help with this? @Zigman13 maybe you can share the snippet you used?
Also, an example file will be useful! Elminson
On Wed, Jan 27, 2021 at 4:55 AM xwiz [email protected] wrote:
@rap2hpoutre https://github.com/rap2hpoutre Do you need help with this? @Zigman13 https://github.com/Zigman13 maybe you can share the snippet you used?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rap2hpoutre/fast-excel/issues/162#issuecomment-768170154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS4R7RVYYJOJ3HTXHUPWBDS37PHLANCNFSM4MEBXTYA .
Unfortunately, it could be considered normal. TBH, this lib could be super optimized to export millions of rows (https://dev.to/rap2hpoutre/export-10m-rows-in-xlsx-with-laravel-without-memory-issues-6bk), but importing has many rooms for improvements! Currently, each row is read from the file sequentially, then returned to a variable that you can use (the big collection). If you want to preserve memory you could avoid returning the variable and directly writing your entries in database:
(new FastExcel)->import('file.xlsx', function ($line) { return User::create([ 'name' => $line['Name'], 'email' => $line['Email'] ]); });Still it will not reduce the import time. And maybe you could have memory issues too.
I guess we could find a way to import big files and preserving memory but I have no idea right now. Maybe you could use Spout (https://opensource.box.com/spout/) which is the lib that is used under the hood to make FastExcel work.
The way of importing of @rap2hpoutre is not very optimal, importing record by record consumes a lot of time and resources, to improve all this, try to insert from 5 to 7 thousand records at one time, this library does not incorporate something to insert massively using chucks , the ideal would be to do something like this:
(new FastExcel())->chunk(7000)->import($file_name, function ($chunk_data) {
//$chunk_data = 7000 records
});
//Or
(new FastExcel())->importChunk($file_name, 7000, function ($chunk_data) {
//$chunk_data = 7000 records
});
As this function does not exist, you have to create this function yourself, for example something like this:
dispatch(function () {
$insertData = function (&$chunk_data) {
TestExports::insert($chunk_data);
$chunk_data = [];
};
$file_name = base_path('test_data.xlsx');
$count = 0;
$chunk_data = [];
(new FastExcel())->import($file_name, function ($data) use (&$count, &$chunk_data, $insertData, $mapData) {
$chunk_data[] = $data;
if (++$count % 7000 == 0)
$insertData($chunk_data);
});
$insertData($chunk_data);
});
This piece of code, massively insert data in chunks of 7000 records, in addition to launching it as a job, so that the user does not see the waiting page.
Instead of using dispatch I would create a Job class to enter all the code, from what I have said, this is a quick example to import a resource in a short time and resources.
I have generated an excel with 315,141 registers and the import time has been 82s and 97.27mb, taking into account that the excel only weighs 9mb of ram, I still see that it consumes a lot, but it is much better than those 8 GB that you say.
We must also take into account that of these 76 seconds, we must add the validation and manipulation of the data, in this example, I have inserted a perfect excel, therefore I have omitted those 2 steps, but performing those 2 steps, the time may increase.
EDIT: I have made another test, this time I have imported 945,423 records, almost the maximum that an excel allows (1,048,576 rows), these are the results: 246s and 291.91mb
Taking into account that to export these same rows, late: 206s and 624.45 kb, to import I still think that it consumes too much, if someone knows any way to reduce the ram, I am open to suggestions.