doctrine-website
doctrine-website copied to clipboard
Create cronjob workflow to index website docs in Algolia
There is currently no automated build step to index the docs in Algolia for the website search bar. This has the following reasons:
- There is an easy to reach limit per month for Algolia where the search won't work anymore if this limit is exceeded.
- There is no reason to build every project's documentation all the time since some don't change very often.
To be able to update the docs regularly and keep the search and its results up-to-date a workflow should be created that builds the indexes at a time before the monthly Algolia limit gets a reset. This way it should be possible to prioritize the users of the search and keep the search availability. Because projects like ORM and DBAL are more frequented than e.g. Annotations, we can also plan different runs for every project in Doctrine to spare Algolia requests.
After reading the code, it seems to me that we only do 1 call to addObjects
per project… how low is that limit? One call to that method will only translate into several requests if there are more than 1000 objects (assuming we are using the default batch size: https://github.com/algolia/algoliasearch-client-php/blob/1c9440d8151cc4c9363128145b898946baffcd42/src/Config/SearchConfig.php#L31)
Given that all the website contents are versioned in Git, instead of building the search index via a cron job, would it make sense to build it based on the diff between the previous and the new website version?
I haven't taken a look into the search index itself but not every change in the docs would affect the search index. One of my first thoughts was about building the index when a change can be found with a diff but there are usually not that many changes which is why I thought about cronjobs as a first step.
The website code is currently flawed when it comes to indexing for a certain project and version. It currently always deletes the whole index. This needs to be handled first before projects can be reindexed separately.