java-sdk-contrib
java-sdk-contrib copied to clipboard
feat(flagd): Improve flagd retry logic and error logging
This PR
- Improves error logging for in process resolver with remote mode.
- Harmonizes backoff implementations across different gRPC handlers.
- Uses
FlagdOptions.getRetryBackoffMs()
to initialize the backoff in all Backoff scenarios.GrpcStreamConnector
previously used a hardcoded value of 2 seconds. - Immediately reconnect on first stream error in
GrpcStreamConnector
. This removes a backoff when a planned deadline exceeds and the connector reconnects. - Unified standard max jitter of 250ms for all backoff use-cases
Fixes #1010
Notes
Different to #1010, error logs are not written when the max retry delay is reached, but already at the second error in a row. Waiting for max retry delay (120 seconds) with exponential backoff starting with 2 seconds would require 126 seconds until the first error gets visible.
Instead, error logs are generated whenever an error queue payload is emitted. Only on the first error we try to reconnect immediately without any backoff (only with default jitter 250ms max) and without emitting an error payload. Starting with the second error in a row we log an error and emit the error payload.
The initial Backoff is now FlagdOptions.getRetryBackoffMs()
in GrpcStreamConnector
(new) and GrpcConnector
(no change).
For the GrpcStreamConnector
this means an initial Backoff of 1 sec (default option) instead of 2 secs.
I've also removed the special handling of DEADLINE_EXCEEDED' errors, as the connector now tries to reconnect silently on any first error. This also solves
DEADLINE_EXCEEDED` issues related to Envoy, where a wrong gRPC status code is reported. See here
With the first immediate retry the new Backoff times for GrpcStreamConnector
are now:
- 0s
- 1s
- 2s
- 4s
- 8s
- ...
- 120s