RFC: Replace Elasticsearch by OpenSearch?
Elasticsearch changed the license in 2021.
OpenSearch is a fork which continues to use the old license.
We might consider switching to OpenSearch. That should be easy from the Open Source releases of Elasticsearch, but I expect that it will become more and more difficult with newer releases and larger differences.
Links: https://github.com/opensearch-project/ https://de.wikipedia.org/wiki/OpenSearch_(Software) https://opensearch.org/blog/technical-posts/2021/10/moving-from-opensource-elasticsearch-to-opensearch/
Personally I approve this request but the current implementation and usage is heavily depending on the used version of ElasticSearch/OpenSearch and so I don't know if there is a easy switch to OpenSearch usage possible.
So far as I know there is an process to "migrate" from direct ElasticSearch usage to a usage over hibernate-search with integration of ElasticSearch. So far as I know about hibernate-search this tool can use different versions of ElasticSearch and maybe OpenSearch by only switching configuration values. I would prefer this solution as we then a lousy coupled to ElasticSearch or OpenSearch without changing code inside Kitodo.Production to switch the used search server implementation.
It seems HibernateSearch is indeed compatible with OpenSearch (see https://hibernate.atlassian.net/browse/HSEARCH-4212 for details). Since all direct ElasticSearch libraries and packages will be removed from the Kitodo.Production repository with the switch to HibernateSearch, this should then indeed resolve the licensing issue.
According to the documentation recent versions of Hibernate Search support both Elasticsearch and OpenSearch. so supporting both in Kitodo.Production might be an easy task.
I have some questions regarding the migration to hibernate-search. Since this issue is mentioned in the recent announcement of the development fund 2023, I'll ask them here.
If I remember correctly, migrating to hibernate-search was already experimented with as part of the development fund 2021, see #4208.
@solth Would it be possible for you to summarize what was done and learned in 2021?
- What problems were revealed?
- Why did you choose to upgrade to ElasticSearch 6 instead of migrating over to
hibernate-searchat the time? - Were you able to confirm that migrating to
hibernate-searchwould actually improve (or at least have a comparable) indexing performance?
Also, there is a public hibernate-search branch that was started in 2021.
- What is the current status of this branch?
- Can this branch be used as the basis for any new work in the development fund 2023?
Thank you and Cheers!
Yes, of course. As you mentioned, the first attempt to replace ElasticSearch with HibernateSearch was done in the context of #4208 where the actual goal was to update ElasticSearch to version 7 (which was succesful).
At that time we hoped the required changes for the migration to HibernateSearch would be manageable and could be performed in the context of the same issue with little extra effort. That turned out to be wrong, though. Instead, the necessary changes proved to be extensive (as you can see in the number of changes made in the branch you mentioned: https://github.com/effective-webwork/kitodo-production/tree/hibernate-search) so we never came around to actually finish the transition to HibernateSearch.
In my experience, the main challenge in the transition to HibernateSearch was the incompatibility of ElasticSearch QueryBuilder objects with the HibernateSearch syntax. The later uses so called SearchPredicates instead of QueryBuilders, which in turn are created by SearchFactory instances. AFAIK these factories only support a lambda method style syntax to create SearchPredicates and once created, those SearchPredicate instances cannot be extended by further clauses or filters anymore. Since that is exactly what is currently done in Kitodo.Production, though, where ES QueryBuilder objects are passed between and augmented in many interconnected classes like SearchService, FilterService and the service classes for the individual object types (most notably ProcessService), refactoring all those QueryBuilder related functions in a way that the SearchFactory variable within the lambda expression can be passed to other functions was a major hassle.
I recently rebased the HibernateSearch branch to resolve conflicts with the current master branch. It is a WIP but I think it can be used as a base for the integration of HibernateSearch in Kitodo.Production. It does already load list entries like processes via HibernateSearch and the indexing on the indexing page is done using the HibernateSearch MassIndexer.
What I cannot say, though, is whether it is the best approach or if rewriting the whole filter and query architecture from ground up to better accomodate the new syntax would be a better way to proceed.
Concerning the performance, the version in that branch is currently quite a bit slower than the current master branch, but that is perhaps due to suboptimal building of quries/search predicates. Indexing the whole index using the MassIndexier is much faster, though.
One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.
One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.
This DTOs was introduced to avoid a possible publication of the Hibernate beans with database credentials in the UI when an error occur or through some manipulation of the UI to access them on client side.
@solth
Thank you for your summary. I wasn't aware of the query building problem. It seems to be possible to logically combine SearchPredicates. But I'm not sure whether that is sufficient to solve the problems you mentioned above.
I think the main problem I encountered was that the current ElasticSearch classes like QueryBuilder are very deeply integrated and used at many different locations in the Kitodo.Production source code, so removing and replacing them completely with new classes from HibernateSearch - that are constructed in a totally different manner - was more difficult than I thought.
Perhaps there are other approaches to replace ElasticSearch with HibernateSearch instead of trying to keep the current class and method architecture of Production related to filters and searching and directly using HibernateSearch objects in all those locations. Maybe it is easier to not use HibernateSearch objects in all those data service classes like TaskService or ProcessService but instead encapsulate all required data in new custom objects and pass those to the final search / filter services that then create a HibernateSearch SearchPredicate without the need to maintain and pass such an object through all layers of the application.
@matthias-ronge, what is the status of task #5760? When do we expect that Hibernate Search (with OpenSearch) will have replaced Elasticsearch?
Would intermediate support of OpenSearch help if this task still takes some time? I started my own OpenSearch branch yesterday which now passes mvn install. The CI tests are still failing.
I now finished a draft pull request #6131 for OpenSearch which seems to work.
Support for both Elasticsearch and OpenSearch was added in PR #6131, so I think this issue can be closed.