fast-excel icon indicating copy to clipboard operation
fast-excel copied to clipboard

Is that normal to read 730k rows for 35minutes+

Open tcagkansokmen opened this issue 5 years ago • 5 comments

I am using it as;

$excel = (new FastExcel)->import($data["file"]);

And it gets 35+ minutes (it didnt finish, i closed) with 8gb ram.

Is this time normal? Or anything to make optimization?

tcagkansokmen avatar Apr 08 '20 16:04 tcagkansokmen

Unfortunately, it could be considered normal. TBH, this lib could be super optimized to export millions of rows (https://dev.to/rap2hpoutre/export-10m-rows-in-xlsx-with-laravel-without-memory-issues-6bk), but importing has many rooms for improvements! Currently, each row is read from the file sequentially, then returned to a variable that you can use (the big collection). If you want to preserve memory you could avoid returning the variable and directly writing your entries in database:

(new FastExcel)->import('file.xlsx', function ($line) {
    return User::create([
        'name' => $line['Name'],
        'email' => $line['Email']
    ]);
});

Still it will not reduce the import time. And maybe you could have memory issues too.

I guess we could find a way to import big files and preserving memory but I have no idea right now. Maybe you could use Spout (https://opensource.box.com/spout/) which is the lib that is used under the hood to make FastExcel work.

rap2hpoutre avatar Apr 22 '20 20:04 rap2hpoutre

After a lot of research i've found that parameter in Spout\Reader\XLSX\Helper\SharedStringsCaching\FileBasedStrategy.php $maxNumStringsPerTempFile default is limited to 10k rows, and reading of 23k+ rows each 8 sheets xlsx file is get slower after magic row number 1663 on any worksheet. So my xlsx file reading speed drop down from 0.0006 seconds at row 1662 to 0.025+ seconds after row 1663, so 1st sheet readed row by row by my generator for 800 seconds. Changing this parameter to 200000+ number of strings per temp file resolved issue, and 1st sheet now read in 20 seconds. Maybe should add as configurable parameter to fastexcel ? or note about that in readme

Zigman13 avatar May 06 '20 20:05 Zigman13

@rap2hpoutre Do you need help with this? @Zigman13 maybe you can share the snippet you used?

xwiz avatar Jan 27 '21 09:01 xwiz

Also, an example file will be useful! Elminson

On Wed, Jan 27, 2021 at 4:55 AM xwiz [email protected] wrote:

@rap2hpoutre https://github.com/rap2hpoutre Do you need help with this? @Zigman13 https://github.com/Zigman13 maybe you can share the snippet you used?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rap2hpoutre/fast-excel/issues/162#issuecomment-768170154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAS4R7RVYYJOJ3HTXHUPWBDS37PHLANCNFSM4MEBXTYA .

elminson avatar Feb 01 '21 18:02 elminson

Unfortunately, it could be considered normal. TBH, this lib could be super optimized to export millions of rows (https://dev.to/rap2hpoutre/export-10m-rows-in-xlsx-with-laravel-without-memory-issues-6bk), but importing has many rooms for improvements! Currently, each row is read from the file sequentially, then returned to a variable that you can use (the big collection). If you want to preserve memory you could avoid returning the variable and directly writing your entries in database:

(new FastExcel)->import('file.xlsx', function ($line) {
    return User::create([
        'name' => $line['Name'],
        'email' => $line['Email']
    ]);
});

Still it will not reduce the import time. And maybe you could have memory issues too.

I guess we could find a way to import big files and preserving memory but I have no idea right now. Maybe you could use Spout (https://opensource.box.com/spout/) which is the lib that is used under the hood to make FastExcel work.

The way of importing of @rap2hpoutre is not very optimal, importing record by record consumes a lot of time and resources, to improve all this, try to insert from 5 to 7 thousand records at one time, this library does not incorporate something to insert massively using chucks , the ideal would be to do something like this:

(new FastExcel())->chunk(7000)->import($file_name, function ($chunk_data) {
    //$chunk_data = 7000 records
});
//Or
(new FastExcel())->importChunk($file_name, 7000, function ($chunk_data) {
    //$chunk_data = 7000 records
});

As this function does not exist, you have to create this function yourself, for example something like this:

dispatch(function () {
    $insertData = function (&$chunk_data) {
        TestExports::insert($chunk_data);
        $chunk_data = [];
    };

    $file_name = base_path('test_data.xlsx');
    $count = 0;
    $chunk_data = [];

    (new FastExcel())->import($file_name, function ($data) use (&$count, &$chunk_data, $insertData, $mapData) {
        $chunk_data[] = $data;
        if (++$count % 7000 == 0)
            $insertData($chunk_data);
    });
    $insertData($chunk_data);
});

This piece of code, massively insert data in chunks of 7000 records, in addition to launching it as a job, so that the user does not see the waiting page.

Instead of using dispatch I would create a Job class to enter all the code, from what I have said, this is a quick example to import a resource in a short time and resources.

I have generated an excel with 315,141 registers and the import time has been 82s and 97.27mb, taking into account that the excel only weighs 9mb of ram, I still see that it consumes a lot, but it is much better than those 8 GB that you say.

We must also take into account that of these 76 seconds, we must add the validation and manipulation of the data, in this example, I have inserted a perfect excel, therefore I have omitted those 2 steps, but performing those 2 steps, the time may increase.

EDIT: I have made another test, this time I have imported 945,423 records, almost the maximum that an excel allows (1,048,576 rows), these are the results: 246s and 291.91mb

Taking into account that to export these same rows, late: 206s and 624.45 kb, to import I still think that it consumes too much, if someone knows any way to reduce the ram, I am open to suggestions.

mcolominas avatar Oct 19 '21 17:10 mcolominas