TypesenseBundle
TypesenseBundle copied to clipboard
Time out
I have hundreds of thousands of data to index and I have a timeout when indexing this module. Would it be possible to send in batches to avoid this?
Hello @smoosies-dev the typesense:import
command normaly works as a batch import.
Did you try to use the max-per-page
option (the default value is 100) to increase the number of documents indexed by iterations ?
I wanted to index 5,800,000 rows and I have a timeout on each try and I did not put a specific parameter, not seen in the Bundle documentation. So the create function of importCommand.php I added a small batch division
protected function execute(InputInterface $input, OutputInterface $output): int
{
$io = new SymfonyStyle($input, $output);
if (!in_array($input->getOption('action'), self::ACTIONS, true)) {
$io->error('Action option only takes the values : "create", "upsert" or "update"');
return 1;
}
$action = $input->getOption('action');
$this->em->getConnection()->getConfiguration()->setSQLLogger(null);
$execStart = microtime(true);
$populated = 0;
$io->newLine();
$collectionDefinitions = $this->collectionManager->getCollectionDefinitions();
foreach ($collectionDefinitions as $collectionDefinition) {
$collectionName = $collectionDefinition['typesense_name'];
$class = $collectionDefinition['entity'];
$q = $this->em->createQuery('select e from '.$class.' e');
$page = 1;
$batchSize = 50000;
while (true) {
$q->setFirstResult(($page - 1) * $batchSize)->setMaxResults($batchSize);
$entities = $q->toIterable();
$nbEntities = 0;
$data = [];
foreach ($entities as $entity) {
$nbEntities++;
$data[] = $this->transformer->convert($entity);
}
if ($nbEntities === 0) {
break;
}
$populated += $nbEntities;
$result = $this->documentManager->import($collectionName, $data, $action);
if ($this->printErrors($io, $result)) {
$this->isError = true;
$io->error('Error happened during the import of the collection : '.$collectionName.' (you can see them with the option -v)');
return 2;
}
$io->text('---------------------------------Import <info>['.$collectionName.'] '.$class.', page='.$page.'</info>');
$page++;
}
$io->text('Import <info>['.$collectionName.'] '.$class.'</info>');
$io->newLine();
}
$io->newLine();
if (!$this->isError) {
$io->success(sprintf(
'%s element%s populated in %s seconds',
$populated,
$populated > 1 ? 's' : '',
round(microtime(true) - $execStart, PHP_ROUND_HALF_DOWN)
));
}
return 0;
}
I'm working on a optimisation
I stopped using entities and used arrays instead
->toIterable(hydrationMode: AbstractQuery::HYDRATE_ARRAY);
ok thanks