[TASK] Optimise scheduler performance
Context: We have a customer with 70 mandates in TYPO3 v13. Of course, each mandate has a ReIndex and a QueueTask. The scheduler is very slow in the overview, even in edit mode.
I have created the whole thing as a task for now, also to encourage discussion.
1.) Add a new index
CREATE INDEX idx_root_changed_indexed ON tx_solr_indexqueue_item (root, changed, indexed);
2.) vendor/apache-solr-for-typo3/solr/Classes/Task/IndexQueueWorkerTask.php
The expensive part here is the constant queries to the site object. There is no permanent cache for this, and building 70 sites is expensive. TYPO3 SiteFinder is also used as a basis, but a lot of information is added. Which is fine at first.
However, since we mainly only need failures and progress in the IndexQueue, all we really want to use are the finished SQL commands from queueStatisticsRepository, which are super fast.
After making this change, we reduced the BE from 7 seconds to 200 ms. Ultimately, we used existing elements, but without major detours via the site.
Disclaimer: We are aware that some specific information is required from the site.
Example for getProgress()
$this->queueStatisticsRepository = GeneralUtility::makeInstance(QueueStatisticsRepository::class);
$stats = $this->queueStatisticsRepository
->findOneByRootPidAndOptionalIndexingConfigurationName(
(int) $this->rootPageId,
''
);
return $stats->getSuccessPercentage();
3.) vendor/apache-solr-for-typo3/solr/Classes/Backend/SiteSelectorField.php
I'm still trying to figure out what's correct here. All pages for the select box are prepared in edit mode in the scheduler. There is a runtime cache, but it only applies to the current page. That doesn't help us here.
The first idea is to integrate a cache ourselves or use SiteFinder directly without any detours. In this case, we are not interested in the other information. What we need: ID and label
Disclaimer: We are aware that there is more to it than that, especially when it comes to invalidating.
4. vendor/apache-solr-for-typo3/solr/Classes/Task/ReIndexTask.php
For us, it was sufficient to use SiteFinder and the title from PageRecord. SiteFinder is, of course, not a universal solution, as it does not take solr settings into account. But we need to improve performance in this area somehow, even if it means using our own cache like SiteFinder does.
public function getAdditionalInformation()
{
$pageRepository = GeneralUtility::makeInstance(PageRepository::class);
$siteFinder = GeneralUtility::makeInstance(SiteFinder::class);
$site = $siteFinder->findByPageId($this->rootPageId);
if (is_null($site)) {
return 'Invalid site configuration for scheduler please re-create the task!';
}
$pageRawRecord = $pageRepository->getRawRecord('pages', $site->getRootPageId());
$information = 'Site: ' . $pageRawRecord['title'];
if (!empty($this->indexingConfigurationsToReIndex)) {
$information .= ', Indexing Configurations: ' . implode(
', ',
$this->indexingConfigurationsToReIndex,
);
}
return $information;
}
@sfroemkenjw, @froemken You have add many improvemen for a multisite already. Could You please cooperate here as well?
1.) The new INDEX could be really interessting for scheduler tasks.
2.) We know that problem. That's why we have created our own scheduler task in jwtools2. With each call it switches to the next site and processed up 2 100 documents. That way just ONE site is fetched instead of all. Maybe that would also an option for EXT:solr.
3.) Same as 2. In our scheduler task you can not define any specific page. It loops through all available indexing configuration and retrieves just ONE site configuration. Currently no excludes possible.
4.) Also solved with 2.
We have a TYPO3 instance with over 300 root pages and opening scheduler with original task of EXT:solr costs minutes in the early days. It was hard to re-run a specific EXT:solr task. That's why we have created our own task. And, if we need to re-run a job for a specific page THEN we create one of the original EXT:solr tasks, execute them manually multiple times and after the job is done we remove that task again. So, it would be nice if EXT:solr has such an additional multi-execution-task, too. Feel free to copy it from here: https://github.com/jweiland-net/jwtools2/blob/main/Classes/Task/IndexQueueWorkerTask.php BUT: This new task is NO replacement for the exiting EXT:solr tasks. They should be kept for compatibility and for individuel runs.
@sfroemkenjw I like the approach. “Take whatever you find and process it.” That could also be a task for EXT:solr that could be maintained. With the appropriate documentation, of course.
So with the issue, I would actually see the new index, an “I don't care” IndexTask, and the corresponding documentation. Do you see it that way too?
@ayacoo Confirmed