quickwit Retry upload on error.

Retry upload on error.

Open fulmicoton opened this issue 1 year ago • 5 comments

Some object storage can fail from to time. In that case we should retry instead of restarting the pipeline and losing all of the work done.

Mar 25 '24 06:03 fulmicoton

This is low priority.

Mar 25 '24 06:03 fulmicoton

We should already be retrying a few times (3?). Is that not working, or does the transient storage issue persist for a duration longer than our retry delay?

The number of retries was set to a low value because, on the search side, we wanted some quick feedback when a storage issue occurred.

We could have a dedicated retry policy for PUT requests.

Mar 25 '24 14:03 guilload

we already have retries on the storage for writes, up to 5 times. Some error were not retried, but now are since https://github.com/quickwit-oss/quickwit/pull/5384, so I think this ticket can be closed

Sep 04 '24 17:09 trinity-1686a

@trinity-1686a Do we retry on S3 internal errors?

Oct 21 '24 01:10 fulmicoton

we retry based on what the sdk defines as transient and throttling errors, list here: https://docs.rs/aws-runtime/1.4.3/src/aws_runtime/retries/classifiers.rs.html#18-36 It doesn't include InternalError, so we don't retry on that

Oct 21 '24 07:10 trinity-1686a

we do retry upload now for all transient errors we could find. If more error conditions should be retried, that should be a separate ticket

Nov 05 '24 10:11 trinity-1686a

quickwit quickwit copied to clipboard

Retry upload on error.

quickwit
quickwit copied to clipboard