TypesenseBundle icon indicating copy to clipboard operation
TypesenseBundle copied to clipboard

"OutOfMemoryError" when importing big amount of entities

Open N2oo opened this issue 1 year ago • 2 comments

Hi,

To reproduce :

use symfony typesense:import --max-per-page=10000 on large amont of values from your database to typesense. Crash appened close to 15000 Entities in memory.

Reason :

The process throw a fatal error : OutOfMemoryError from Doctrine's classes because Entity manager isn't cleared.

Here is the fact : You must detach Objects from Doctrine by clearing the entity manager. $this->em->clear() Here is the ressource that made me think about it. https://www.doctrine-project.org/projects/doctrine-orm/en/2.14/reference/batch-processing.html

Suggestion :

Here is the change i made from the ImportCommand class :

private function populateIndex(InputInterface $input, OutputInterface $output, string $index)
   {
       /*...*/

       for ($i = $firstPage; $i <= $lastPage; ++$i) {
           $q = $this->em->createQuery('select e from '.$class.' e')
               ->setFirstResult(($i - 1) * $maxPerPage)
               ->setMaxResults($maxPerPage)
           ;

           if ($io->isDebug()) {
               $io->text('<info>Running request : </info>'.$q->getSQL());
           }

           $entities = $q->toIterable();

           $data = [];
           foreach ($entities as $entity) {
               $data[] = $this->transformer->convert($entity);
           }

           $io->text('Import <info>['.$collectionName.'] '.$class.'</info> Page '.$i.' of '.$lastPage.' ('.count($data).' items)');

           $result = $this->documentManager->import($collectionName, $data, $action);


           if ($this->printErrors($io, $result)) {
               $this->isError = true;

               throw new \Exception('Error happened during the import of the collection : '.$collectionName.' (you can see them with the option -v)');
           }

           $populated += count($data);
           $this->em->clear(); //clear every iterations
       }
       $this->em->clear();//clear cache after processing all data

       $io->newLine();
       return $populated;
   }

I've made a fork using this edit, everything seem's ok : I didn't noticed big performance issues.

Hope it could help, Have good day.

Originally posted by @N2oo in https://github.com/acseo/TypesenseBundle/issues/74#issuecomment-1479544670

N2oo avatar Dec 12 '23 20:12 N2oo

Only crashing with my suggestion in prod env.

Looking forward for solution.

N2oo avatar Dec 12 '23 20:12 N2oo

Hello @N2oo

Can you try with https://github.com/kiora-tech/TypesenseBundle? I add this optimisation https://github.com/acseo/TypesenseBundle/pull/96

james2001 avatar Dec 13 '23 14:12 james2001