Solr retry
Sometime Solr fails to index due to temporary problems. For example:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/File: No registered leader was found after waiting for 4000ms , collection: File slice: shard1_1_0
To better cope with these failures, try to use retry mechanism of solrj.
https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/SolrHttpRequestRetryHandler.html
https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#requestWithRetryOnStaleState(org.apache.solr.client.solrj.SolrRequest,%20int,%20java.lang.String)
https://doc.sitecore.com/xp/en/developers/101/platform-administration-and-architecture/configure-the-solr-retry-strategy.html
https://inoio.github.io/solrs/index.html
Implement the asyncronous retry pattern on every method used from SolrClient in SolrUtils.
Note that the base CloudSolrClient already does retries but they are quick and without wait, to cope mainly with network issues. The retries we want to implement is to cope with incoherent cluster quorum states where there are conflicts with leader election which might take 10 min to resolve. Therefore, we need a retry mechanism that supports exponencial backoff.
Configure the retry policy to use exponencial backoff.
Check if SolrClient is not directly used in any other class.
Add retry here:
- [ ] SolrUtils (no retry)
- [ ] IterableIndexResult (already has a retry mechanism but without exponencial backoff).
- [ ] IndexResultIterator (not used?)
Some direct used of the SolrClient should use SolrUtils instead:
- [ ] IndexService: several deletes, commits and optimizes
- [ ] Change SolrUtils from a static methods to a instance where the SolrClient is wrapped as a argument. Change the name to IndexClient. Change the IndexService getSolrClient() to getIndexClient() and get this wrapped class to avoid direct uses of Solr methods.
No retry needed here:
- SolrBootstrapUtils (only used for bootstrap)
- SchemaBuilder (only used in bootstrap)
- MigrationManager (only used for upgrade)
Some libraries to help implement retry strategies:
- https://www.baeldung.com/spring-retry
- http://rholder.github.io/guava-retrying/ (last commit 2016)
Support documentation:
- https://dzone.com/articles/understanding-retry-pattern-with-exponential-back
- https://www.baeldung.com/spring-retry
- https://github.com/spring-projects/spring-retry
Command
zgrep -A 1 "org.apache.solr.client.solrj.SolrClient" roda-core-2022-* | grep "at org.roda" | cut -d "(" -f2 | sort | uniq -c | sort -k 1
Roda Errors
| Classes | Line | Count |
|---|---|---|
| SolrUtils.java | 183 | 123 |
| SolrUtils.java | 1372 | 1295 |
| SolrUtils.java | 1581 | 26 |
| IndexService.java | 681 | 2 |
| SolrUtils.java | 1206 | 2 |
| SolrUtils.java | 1175 | 34 |
| SolrUtils.java | 1223 | 638 |
| SolrUtils.java | 202 | 6 |
Command
zgrep "ERROR org.roda.core.index.utils.SolrUtils" roda-core-2022-*
RODA Errors
| ERROR Message | Classes | Method | Lines |
|---|---|---|---|
| ERROR org.roda.core.index.utils.SolrUtils - Error deleting document from index | SolrUtils | delete | [1556-1569] |
| ERROR org.roda.core.index.utils.SolrUtils - Error deleting documents from index | SolrUtils | delete | [1577-1593] |
| ERROR org.roda.core.index.utils.SolrUtils - Error commiting into collection: {Class} | SolrUtils | commit | [1167-1177] |
| ERROR org.roda.core.index.utils.SolrUtils - Error adding document to index | SolrUtils | create | [1200-1214] |
| ERROR org.roda.core.index.utils.SolrUtils - Error adding document to index | SolrUtils | create2 | [1216-1234] |
| ERROR org.roda.core.index.utils.SolrUtils - Could not return object label of {Classname} {id} | SolrUtils | getObjectLabel | [1362-1396] |
Reviewing possible libraries: https://failsafe.dev/ https://github.com/resilience4j/resilience4j
Failsafe will be used
Release in tag 4.5.0.