TypesenseBundle icon indicating copy to clipboard operation
TypesenseBundle copied to clipboard

Time out

Open smoosies-dev opened this issue 1 year ago • 4 comments

I have hundreds of thousands of data to index and I have a timeout when indexing this module. Would it be possible to send in batches to avoid this?

smoosies-dev avatar Oct 04 '23 08:10 smoosies-dev

Hello @smoosies-dev the typesense:import command normaly works as a batch import.

Did you try to use the max-per-page option (the default value is 100) to increase the number of documents indexed by iterations ?

npotier avatar Oct 04 '23 14:10 npotier

I wanted to index 5,800,000 rows and I have a timeout on each try and I did not put a specific parameter, not seen in the Bundle documentation. So the create function of importCommand.php I added a small batch division

protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);
        if (!in_array($input->getOption('action'), self::ACTIONS, true)) {
            $io->error('Action option only takes the values : "create", "upsert" or "update"');
            return 1;
        }
        $action = $input->getOption('action');
        $this->em->getConnection()->getConfiguration()->setSQLLogger(null);
        $execStart = microtime(true);
        $populated = 0;
        $io->newLine();
        $collectionDefinitions = $this->collectionManager->getCollectionDefinitions();
        foreach ($collectionDefinitions as $collectionDefinition) {
            $collectionName = $collectionDefinition['typesense_name'];
            $class          = $collectionDefinition['entity'];
            $q = $this->em->createQuery('select e from '.$class.' e');
            $page = 1;
            $batchSize = 50000;
            while (true) {
                $q->setFirstResult(($page - 1) * $batchSize)->setMaxResults($batchSize);
                $entities = $q->toIterable();
                $nbEntities = 0;
                $data = [];
                foreach ($entities as $entity) {
                    $nbEntities++;
                    $data[] = $this->transformer->convert($entity);
                }
                if ($nbEntities === 0) {
                    break;
                }
                $populated += $nbEntities;
                $result = $this->documentManager->import($collectionName, $data, $action);
                if ($this->printErrors($io, $result)) {
                    $this->isError = true;
                    $io->error('Error happened during the import of the collection : '.$collectionName.' (you can see them with the option -v)');
                    return 2;
                }
                $io->text('---------------------------------Import <info>['.$collectionName.'] '.$class.', page='.$page.'</info>');
                $page++;
            }
            $io->text('Import <info>['.$collectionName.'] '.$class.'</info>');
            $io->newLine();
        }
        $io->newLine();
        if (!$this->isError) {
            $io->success(sprintf(
                '%s element%s populated in %s seconds',
                $populated,
                $populated > 1 ? 's' : '',
                round(microtime(true) - $execStart, PHP_ROUND_HALF_DOWN)
            ));
        }
        return 0;
    }

smoosies-dev avatar Oct 04 '23 14:10 smoosies-dev

I'm working on a optimisation

image

I stopped using entities and used arrays instead

->toIterable(hydrationMode: AbstractQuery::HYDRATE_ARRAY);

james2001 avatar Oct 06 '23 20:10 james2001

ok thanks

smoosies-dev avatar Oct 12 '23 14:10 smoosies-dev