Rulers should be able to retry when there is a failure to write to ingesters
Is your feature request related to a problem? Please describe.
When rules are evaluated, ruler writes to Ingester the Vector resulting from the evaluation of the rule using a gprc client. In case ingesters are undergoing a transient error, data is lost resulting in the following message in the logs which can associated with different types of grpc errors:
Rule sample appending failed
As far I can tell, cortex is not configuring the grpc clients with a retry policy and hence failed gprc requests are never retried..
Describe the solution you'd like I'd like to be possible to configure rulers to retry on failed grpc requests to ingesters so that it is possible to recover from transient errors gracefully without loosing any data. This could be potentially implemented for other components.
Additional context
- https://github.com/grpc/proposal/blob/master/A6-client-retries.md
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.