cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Rulers should be able to retry when there is a failure to write to ingesters

Open rapphil opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

When rules are evaluated, ruler writes to Ingester the Vector resulting from the evaluation of the rule using a gprc client. In case ingesters are undergoing a transient error, data is lost resulting in the following message in the logs which can associated with different types of grpc errors:

Rule sample appending failed

As far I can tell, cortex is not configuring the grpc clients with a retry policy and hence failed gprc requests are never retried..

Describe the solution you'd like I'd like to be possible to configure rulers to retry on failed grpc requests to ingesters so that it is possible to recover from transient errors gracefully without loosing any data. This could be potentially implemented for other components.

Additional context

  • https://github.com/grpc/proposal/blob/master/A6-client-retries.md

rapphil avatar May 08 '24 23:05 rapphil

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 18:04 stale[bot]