roda icon indicating copy to clipboard operation
roda copied to clipboard

Solr retry

Open luis100 opened this issue 7 years ago • 7 comments

Sometime Solr fails to index due to temporary problems. For example: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/File: No registered leader was found after waiting for 4000ms , collection: File slice: shard1_1_0

To better cope with these failures, try to use retry mechanism of solrj.

https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/SolrHttpRequestRetryHandler.html

https://lucene.apache.org/solr/5_5_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#requestWithRetryOnStaleState(org.apache.solr.client.solrj.SolrRequest,%20int,%20java.lang.String)

luis100 avatar Apr 30 '18 08:04 luis100

https://doc.sitecore.com/xp/en/developers/101/platform-administration-and-architecture/configure-the-solr-retry-strategy.html

luis100 avatar Jan 03 '22 11:01 luis100

https://inoio.github.io/solrs/index.html

luis100 avatar Jan 03 '22 11:01 luis100

Implement the asyncronous retry pattern on every method used from SolrClient in SolrUtils.

Note that the base CloudSolrClient already does retries but they are quick and without wait, to cope mainly with network issues. The retries we want to implement is to cope with incoherent cluster quorum states where there are conflicts with leader election which might take 10 min to resolve. Therefore, we need a retry mechanism that supports exponencial backoff.

Configure the retry policy to use exponencial backoff.

Check if SolrClient is not directly used in any other class.

Add retry here:

  • [ ] SolrUtils (no retry)
  • [ ] IterableIndexResult (already has a retry mechanism but without exponencial backoff).
  • [ ] IndexResultIterator (not used?)

Some direct used of the SolrClient should use SolrUtils instead:

  • [ ] IndexService: several deletes, commits and optimizes
  • [ ] Change SolrUtils from a static methods to a instance where the SolrClient is wrapped as a argument. Change the name to IndexClient. Change the IndexService getSolrClient() to getIndexClient() and get this wrapped class to avoid direct uses of Solr methods.

No retry needed here:

  • SolrBootstrapUtils (only used for bootstrap)
  • SchemaBuilder (only used in bootstrap)
  • MigrationManager (only used for upgrade)

luis100 avatar Mar 31 '22 11:03 luis100

Some libraries to help implement retry strategies:

  • https://www.baeldung.com/spring-retry
  • http://rholder.github.io/guava-retrying/ (last commit 2016)

Support documentation:

  • https://dzone.com/articles/understanding-retry-pattern-with-exponential-back
  • https://www.baeldung.com/spring-retry
  • https://github.com/spring-projects/spring-retry

hmiguim avatar Aug 01 '22 09:08 hmiguim

Command

zgrep -A 1 "org.apache.solr.client.solrj.SolrClient" roda-core-2022-* | grep "at org.roda" | cut -d "(" -f2 | sort | uniq -c | sort -k 1

Roda Errors

Classes Line Count
SolrUtils.java 183 123
SolrUtils.java 1372 1295
SolrUtils.java 1581 26
IndexService.java 681 2
SolrUtils.java 1206 2
SolrUtils.java 1175 34
SolrUtils.java 1223 638
SolrUtils.java 202 6

Command

zgrep "ERROR org.roda.core.index.utils.SolrUtils" roda-core-2022-*

RODA Errors

ERROR Message Classes Method Lines
ERROR org.roda.core.index.utils.SolrUtils - Error deleting document from index SolrUtils delete [1556-1569]
ERROR org.roda.core.index.utils.SolrUtils - Error deleting documents from index SolrUtils delete [1577-1593]
ERROR org.roda.core.index.utils.SolrUtils - Error commiting into collection: {Class} SolrUtils commit [1167-1177]
ERROR org.roda.core.index.utils.SolrUtils - Error adding document to index SolrUtils create [1200-1214]
ERROR org.roda.core.index.utils.SolrUtils - Error adding document to index SolrUtils create2 [1216-1234]
ERROR org.roda.core.index.utils.SolrUtils - Could not return object label of {Classname} {id} SolrUtils getObjectLabel [1362-1396]

JoaoGomes2110 avatar Aug 05 '22 11:08 JoaoGomes2110

Reviewing possible libraries: https://failsafe.dev/ https://github.com/resilience4j/resilience4j

luis100 avatar Aug 10 '22 15:08 luis100

Failsafe will be used

hmiguim avatar Aug 16 '22 16:08 hmiguim

Release in tag 4.5.0.

hmiguim avatar Jan 06 '23 11:01 hmiguim