java-sdk-contrib icon indicating copy to clipboard operation
java-sdk-contrib copied to clipboard

[flagd] improve error logging for reconnect scenarios

Open aepfli opened this issue 4 months ago • 4 comments

Our FlagD in-process provider is built with resilience. We will try to reconnect when there is a loss of connection, which is excellent and the right thing to do in a distributed world. We want to be able to continue to serve flags properly. The Problem we're facing is that each reconnect will log an error, but is it an error if we recover from this state? IMHO, it is an error if we reach our maximum reconnect delay.

I propose to change our log level here to info, and additionally log an error, if we reach the max delay here.

https://github.com/open-feature/java-sdk-contrib/blob/2a8ea6c8e7f0eba248ca003cf1afa6d0410a68d2/providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/process/storage/connector/grpc/GrpcStreamConnector.java#L178-L186

This way, we handle reconnects gracefully but will still retrieve the information about connection issues in a timely manner in an error case.

Additionally/Optionally, we can add an immediate error log for the first connection attempts. In this case, we might not want to wait for the maximum delay.

Goals

  • reducing normal reconnection logs to info or warn
  • separate logs with dedicated message for metadata or stream
  • log the error if we reach max delay
  • Optional: log the error immediately if we're in the first connection.

aepfli avatar Oct 08 '24 08:10 aepfli