fast-excel
fast-excel copied to clipboard
[Enhancement] Import big data using chucks
From what I have commented on #162, this package is missing an option to import using chucks.
Unfortunately, it could be considered normal. TBH, this lib could be super optimized to export millions of rows (https://dev.to/rap2hpoutre/export-10m-rows-in-xlsx-with-laravel-without-memory-issues-6bk), but importing has many rooms for improvements! Currently, each row is read from the file sequentially, then returned to a variable that you can use (the big collection). If you want to preserve memory you could avoid returning the variable and directly writing your entries in database:
(new FastExcel)->import('file.xlsx', function ($line) { return User::create([ 'name' => $line['Name'], 'email' => $line['Email'] ]); });Still it will not reduce the import time. And maybe you could have memory issues too. I guess we could find a way to import big files and preserving memory but I have no idea right now. Maybe you could use Spout (https://opensource.box.com/spout/) which is the lib that is used under the hood to make FastExcel work.
The way of importing of @rap2hpoutre is not very optimal, importing record by record consumes a lot of time and resources, to improve all this, try to insert from 5 to 7 thousand records at one time, this library does not incorporate something to insert massively using chucks , the ideal would be to do something like this:
(new FastExcel())->chunk(7000)->import($file_name, function ($chunk_data) { //$chunk_data = 7000 records }); //Or (new FastExcel())->importChunk($file_name, 7000, function ($chunk_data) { //$chunk_data = 7000 records });As this function does not exist, you have to create this function yourself, for example something like this:
dispatch(function () { $insertData = function (&$chunk_data) { TestExports::insert($chunk_data); $chunk_data = []; }; $file_name = base_path('test_data.xlsx'); $count = 0; $chunk_data = []; (new FastExcel())->import($file_name, function ($data) use (&$count, &$chunk_data, $insertData, $mapData) { $chunk_data[] = $data; if (++$count % 7000 == 0) $insertData($chunk_data); }); $insertData($chunk_data); });This piece of code, massively insert data in chunks of 7000 records, in addition to launching it as a job, so that the user does not see the waiting page.
Instead of using dispatch I would create a Job class to enter all the code, from what I have said, this is a quick example to import a resource in a short time and resources.
I have generated an excel with 315.141 registers and the import time has been 82s and 97.27mb, taking into account that the excel only weighs 9mb of ram, I still see that it consumes a lot, but it is much better than those 8 GB that you say.
We must also take into account that of these 76 seconds, we must add the validation and manipulation of the data, in this example, I have inserted a perfect excel, therefore I have omitted those 2 steps, but performing those 2 steps, the time may increase.
The idea would be to implement a function that receives 4 parameters:
public function importUsingChunk($path, int $chunk_size, callable $callback_rows, ?callable $callback_map = null){}
//$path: Path of the file to import.
//$chunk_size: Chunk size.
//$callback_rows($chunk_rows): Obtains by parameter an array that contains X rows of the current chunk, here the data will be inserted into the database.
//$callback_map($row): It receives a single row by parameter, which will be used to validate and / or modify the values of said row, I have added this function to improve performance, and in this way avoid doing a foreach of the X rows of the chunk.
I'd love this feature. Happy to PR if others want?
Is there a reason why you aren't just chunking and upsertting/inserting it after getting the results gathered? I prefer using upsert because it works like updateOrCreate if the DB Schema property is set to Unique.
Like for instance:
$results = FastExcel::import(Storage::disk("public")->path($filename), function ($line) {
return [
'sku' => $line["Part Number"],
]
}
collect($results)
->chunk(10000)
->each(function ($chunk) {
Product::upsert($chunk->toArray(), 'sku');
});
If the spreadsheet is huge, this cases memory issues pretty quickly. I guess could just up the memory, but since the file is read line by line should be easy to just yield each chunk from a generator similar :)
My company import many and hundreds of csv everyday. currently I'm using goodby/csv,
but somehow I want to use fast-excel unfortunately fast-excel always exceed the ram usage (over 2GB). where using goodby/csv I can use only 64MB of ram just fine.
I really hope fast-excel could do better with importing