confluent-kafka-dotnet icon indicating copy to clipboard operation
confluent-kafka-dotnet copied to clipboard

All broker connections are down ; 3/3 brokers are down

Open naveen-ilink opened this issue 7 years ago • 13 comments

Description

We keeping getting these connection down or connection closed errors from Producer callback and Consumer Callback methods. After researching with lot of articles, i tried the following 2 config properties "log.connection.close":false "socket.keepalive.enable":true,

But it didnt work for me and we are still seeing the same issues for every 1 hour or 30 minutes. I am using Confluent Kafka library nuget version V.0.9.5. Should i need to upgrade to V.0.11.0 version in order to suppress these messages with above config properties?

naveen-ilink avatar Aug 25 '17 19:08 naveen-ilink

You haven't provided the error message, so it's difficult to comment... but are your clients idle? The broker will disconnect idle clients automatically and librdkafka produces an error message when this happens. If this is the error you are seeing, you can safely ignore it - the clients will reconnect as required if you try to produce / consume additional messages. This behavior is not different in 0.11.0 vs 0.9.5.

mhowlett avatar Aug 25 '17 20:08 mhowlett

Yes, the issue happens only when my clients are idle. But my question is why the setting "log.connection.close: false" is not suppressing these issues.

naveen-ilink avatar Aug 25 '17 20:08 naveen-ilink

ahh, yes, right. can you paste the error messages you're seeing?

mhowlett avatar Aug 25 '17 21:08 mhowlett

@naveen-ilink log.connection.close: false will only suppress the individual broker disconnect logs, not the ALL_BROKERS_DOWN event/log.

I suggest registering an error delegate (error_cb in C) to handle and ignore the ALL_BROKERS_DOWN event, the error callback will also suppress the log message.

Side note: socket keepalives operate on the TCP layer, not application (Kafka) layer, so they will not help with the Kafka broker's idle connection reaper.

edenhill avatar Aug 26 '17 06:08 edenhill

@edenhill as i understand, this is caused by normal flow of events and closed connections is not a problem. But ERROR logging level suggests otherwise. Any sensible developer will panic and launch investigation when see "ERROR ALL_BROKERS_DOWN" in their log messages.

Could you please consider changing logging level to INFO?

vchekan avatar Aug 15 '18 19:08 vchekan

@vchekan is talking sense... we should change this.

mhowlett avatar Aug 24 '18 22:08 mhowlett

Agree this should be INFO (if harmless), not ERROR. Any eta of fix?

sumedhsakdeo avatar Sep 21 '18 05:09 sumedhsakdeo

In 1.0-experimental-13, the OnError event returns an ErrorEvent type that includes an IsFatal property. Errors without this set may be useful for the user to act on, but should generally be considered informational. All things considered, I like this choice of API [So fixed in 1.0-experimental-13].

mhowlett avatar Sep 21 '18 15:09 mhowlett

Was this ever addressed?

Sakkyoku-Sha avatar Feb 04 '21 21:02 Sakkyoku-Sha

no, it's a librdkafkaism and i don't reckon it will be changed in the near term.

mhowlett avatar Feb 04 '21 23:02 mhowlett

Hi, I'm trying to handle connection lost and reconnection on TCP layer in my client. I tried to used Allbrokersdown error code but it didn't work. Thanks to this issue flow, i noticed I was on wrong way.

Is there any way to understand that connection lost and reconnected situation? It can be an error code or another things.

Thanks.

aktasr avatar Oct 11 '21 14:10 aktasr

Does that message keep the same? I have a customer who is facing a similar "problem", I've been troubleshooting their application, but it doesn't seem a real problem, please let me know if this message didn't change at all.

claudiogodoy99 avatar Jul 11 '23 13:07 claudiogodoy99

Faced the same problem. We have spent time on problem investigation because ERROR looks like a critical problem. Why is this issue still unresolved?

theramzay avatar Dec 19 '23 10:12 theramzay