confluent-kafka-dotnet
confluent-kafka-dotnet copied to clipboard
All broker connections are down ; 3/3 brokers are down
Description
We keeping getting these connection down or connection closed errors from Producer callback and Consumer Callback methods. After researching with lot of articles, i tried the following 2 config properties "log.connection.close":false "socket.keepalive.enable":true,
But it didnt work for me and we are still seeing the same issues for every 1 hour or 30 minutes. I am using Confluent Kafka library nuget version V.0.9.5. Should i need to upgrade to V.0.11.0 version in order to suppress these messages with above config properties?
You haven't provided the error message, so it's difficult to comment... but are your clients idle? The broker will disconnect idle clients automatically and librdkafka produces an error message when this happens. If this is the error you are seeing, you can safely ignore it - the clients will reconnect as required if you try to produce / consume additional messages. This behavior is not different in 0.11.0 vs 0.9.5.
Yes, the issue happens only when my clients are idle. But my question is why the setting "log.connection.close: false" is not suppressing these issues.
ahh, yes, right. can you paste the error messages you're seeing?
@naveen-ilink log.connection.close: false
will only suppress the individual broker disconnect logs, not the ALL_BROKERS_DOWN event/log.
I suggest registering an error delegate (error_cb in C) to handle and ignore the ALL_BROKERS_DOWN event, the error callback will also suppress the log message.
Side note: socket keepalives operate on the TCP layer, not application (Kafka) layer, so they will not help with the Kafka broker's idle connection reaper.
@edenhill as i understand, this is caused by normal flow of events and closed connections is not a problem.
But ERROR
logging level suggests otherwise. Any sensible developer will panic and launch investigation when see "ERROR ALL_BROKERS_DOWN" in their log messages.
Could you please consider changing logging level to INFO?
@vchekan is talking sense... we should change this.
Agree this should be INFO (if harmless), not ERROR. Any eta of fix?
In 1.0-experimental-13, the OnError
event returns an ErrorEvent
type that includes an IsFatal
property. Errors without this set may be useful for the user to act on, but should generally be considered informational. All things considered, I like this choice of API [So fixed in 1.0-experimental-13].
Was this ever addressed?
no, it's a librdkafkaism and i don't reckon it will be changed in the near term.
Hi, I'm trying to handle connection lost and reconnection on TCP layer in my client. I tried to used Allbrokersdown error code but it didn't work. Thanks to this issue flow, i noticed I was on wrong way.
Is there any way to understand that connection lost and reconnected situation? It can be an error code or another things.
Thanks.
Does that message keep the same? I have a customer who is facing a similar "problem", I've been troubleshooting their application, but it doesn't seem a real problem, please let me know if this message didn't change at all.
Faced the same problem. We have spent time on problem investigation because ERROR looks like a critical problem. Why is this issue still unresolved?