exporter
exporter copied to clipboard
Optimizing export
For export large data need call flush() in StreamedResponse. its allow clear buffer and not use extra memory. Todo as feature of framework?
My solution:
# config/services.yml
services:
app.export.writer.csv:
class: App\Writer\FlushWriterDecorator
decorates: sonata.exporter.writer.csv
app.export.writer.json:
class: App\Writer\FlushWriterDecorator
decorates: sonata.exporter.writer.json
app.export.writer.xml:
class: App\Writer\FlushWriterDecorator
decorates: sonata.exporter.writer.xml
app.export.writer.xls:
class: App\Writer\FlushWriterDecorator
decorates: sonata.exporter.writer.xls
<?php
declare(strict_types=1);
namespace App\Writer;
use Sonata\Exporter\Writer\TypedWriterInterface;
class FlushWriterDecorator implements TypedWriterInterface
{
private const FLUSH_EVERY = 1000;
private int $position;
private TypedWriterInterface $writer;
public function __construct(TypedWriterInterface $writer)
{
$this->writer = $writer;
}
public function open(): void
{
$this->writer->open();
$this->position = 0;
}
public function write(array $data): void
{
$this->writer->write($data);
$this->position++;
if (0 === ($this->position % self::FLUSH_EVERY)) {
flush();
}
}
public function close(): void
{
$this->writer->close();
}
public function getDefaultMimeType(): string
{
return $this->writer->getDefaultMimeType();
}
public function getFormat(): string
{
return $this->writer->getFormat();
}
}
Hi, this feature could be great. Wanna start a Pr ? :)
Yes! I have next questions:
- Is optional decorator?
- Value of FLUSH_EVERY pass by parameters for more flexability? If null (on default) no need use decorator
Yes!
I have next questions:
Is optional decorator?
Value of FLUSH_EVERY pass by parameters for more flexability?
If null (on default) no need use decorator
-
dont know 😅
-
this value could come from a sonata exporter config parameter indeed. I'm not sure if we should keep null as default value or if we can abritrary chose a number.
There have other problem with get large data from doctrine. My solve this problem by using many paginated queries. Need solve this also in this bundle?
Im my test combine both optimizing dont eat memory more than 60MB.. Exported file size it was 650MB
This improved version of DoctrineORMQuerySourceIterator
<?php
declare(strict_types=1);
namespace App\Export\Source;
use Doctrine\ORM\Query;
use Sonata\Exporter\Source\DoctrineORMQuerySourceIterator;
final class PaginatedDoctrineORMQuerySourceIterator extends DoctrineORMQuerySourceIterator
{
private const PAGE_SIZE = 1000;
private int $page = 0;
public function __construct(Query $query, array $fields, string $dateTimeFormat = 'r')
{
parent::__construct($query, $fields, $dateTimeFormat);
$this->query->setMaxResults(self::PAGE_SIZE);
$this->query->setFirstResult(0);
}
public function next(): void
{
$this->iterator->next();
if (!$this->iterator->valid()) {
$this->page++;
$this->query->setFirstResult($this->page * self::PAGE_SIZE);
$this->query->getEntityManager()->clear();
$this->iterator = null;
$this->rewind();
}
}
}
That screenshot indicates low memory usage when exporting big data. Im apply both optimization. Downloaded file with size is 637.4 MB
This improved version of DoctrineORMQuerySourceIterator
<?php declare(strict_types=1); namespace App\Export\Source; use Doctrine\ORM\Query; use Sonata\Exporter\Source\DoctrineORMQuerySourceIterator; final class PaginatedDoctrineORMQuerySourceIterator extends DoctrineORMQuerySourceIterator { private const PAGE_SIZE = 1000; private int $page = 0; public function __construct(Query $query, array $fields, string $dateTimeFormat = 'r') { parent::__construct($query, $fields, $dateTimeFormat); $this->query->setMaxResults(self::PAGE_SIZE); $this->query->setFirstResult(0); } public function next(): void { $this->iterator->next(); if (!$this->iterator->valid()) { $this->page++; $this->query->setFirstResult($this->page * self::PAGE_SIZE); $this->query->getEntityManager()->clear(); $this->iterator = null; $this->rewind(); } } }
You need to take in account the fact that the query may already have a first result and a max result.
For example: If I want all the result from 3 to 2500, you'll have to do 3 to 1000 then 1000 to 2000 then 2000 to 2500.
Doesnt work if there is 3000 result in my example, because you'll export all of them even if I would like to stop at 2500.
In construct OriginalFirstResult = query->getfirstresult(); OriginalMaxResult = query->getMaxResult()
Query->setMaxResult(min(originalMaxResult, originalFirstResult + page size))
Then, the check would be If iterator not valid and originalMaxResult > currentMaxResult (where currentMaxResult = query->getMaxResult())
SetMaxResult((min(originalMaxResult, currentMaxResult + page size)) SetFirstResult(currentFirstResult + page size)
With this formula you dont need the page property, neither the originalFirstResult property ; just the originalMaxResult property.
I'm on phone, if it's not clear enough, I'll improve my message in two days.
Edit: did you remove your message or I'm crazy ? 😂
If we use pagination, then we may not be sure that the next query will return next data. For example, if you sort in descending order by creation date (from new to old), then at one point it may turn out that there will be several duplicate records in the upload. Also, the database does not guarantee that the data will be selected in the same sequence using limits and offsets. This can be solved by resetting the ascending sorting ID. And in the next request, take records strictly older than this value, or vice versa.
I think we just left note in README that for using paginate export need not changed and ordered data.
If we use pagination, then we may not be sure that the next query will return next data. For example, if you sort in descending order by creation date (from new to old), then at one point it may turn out that there will be several duplicate records in the upload. Also, the database does not guarantee that the data will be selected in the same sequence using limits and offsets. This can be solved by resetting the ascending sorting ID. And in the next request, take records strictly older than this value, or vice versa.
Oh yes indeed. Maybe @greg0ire can help about this kind of problem. But anyway, currently the export doesn't work when there is too much data, so any improvement is good to take.
If you start a PR, the reviews can lead us to the right way ;)
You can fix such issues with cursor based pagination: https://use-the-index-luke.com/no-offset
So:
- Validate: Initial query must havnt offset & limit.
- Validate: must have order by in initial query.
- Add limit (PAGE_SIZE) for query.
- Extract seek field from order by.
- Fetch first page.
- Do export.
If count(results) = PAGE_SIZE:
- Add seek clause. If query contains where clause wee need wrap it into parenthesis and add and seek clause. (Avoid repeat this step)
- Extract last seek value for previous results. (From last results)
- Fetch next results.
- Go to 6
Why 1 ? Cant you use the max result and the offset provided ?
Why 2 ? Cant you add one order if needed ?
Why 1 ? Cant you use the max result and the offset provided ?
Extra where in 7 broke offset & limit.
Why 2 ? Cant you add one order if needed ?
We must order results for correct work this method. Yes, we can add order by if needed. But problem: What field? For ex: id can not exists (Or provide this field name in constructor?)
@kirya-dev do you want to start a Pr ?
I think it would be easier to discuss about this with code and some tests.
@kirya-dev Hi, I tried to use your solution with latest version of exporter, but export is still crashing on memory (even with 500MB memory and only 15k records). Could you please point me at right direction? Thanks in advance.
@kirya-dev Hi, I tried to use your solution with latest version of exporter, but export is still crashing on memory (even with 500MB memory and only 15k records). Could you please point me at right direction? Thanks in advance.
Hi! Please ensure that decorators are enabled and you are using custom source interator.
Also can help you https://www.php.net/manual/en/function.flush.php
Do you still plan to make some PR in order to optimize the export @kirya-dev ?
I'm getting some Error: Maximum execution time of ... seconds exceeded
error when exporting via excel.
So I'm looking for way to fix this.
Hello! Decide this problem is no simple task. Good idea was suggest to using database cursor. Doctrine dosent supports this functionality officcialy. But im found package for resolve this missing https://github.com/paysera/lib-pagination What do you think about it?
https://github.com/paysera/lib-pagination What do you think about it?
We won't add a dependency to a low maintenance library, but
You can fix such issues with cursor based pagination: https://use-the-index-luke.com/no-offset
Cursor based pagination was the advice of @greg0ire. So we could try something similar.
Implementations for many platforms can be different. We must implements this feature for every popular database platforms.. Its a big work
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.