kafka_ex
kafka_ex copied to clipboard
A down broker causes unnecessary restarts
If we have a three broker cluster in the config, but one of those brokers is down, the application appears to continually crash and restart. The cluster is set up properly to handle the down broker and other (non-KafkaEx) services appear to adapt to this just fine.
Removing the downed broker from the config list resolves the problem. I.e., this is not a problem with the way the client runs but a problem with how it is dealing with potentially out-of-date configuration.
I haven't observed this in a while - I believe the heartbeating is now catching this and refreshing the metadata if a broker is actually down (not just temporarily unreachable). If a broker is unreachable, kafkaex will still attempt to connect to it because that's where the metadata points it for some partitions. I think this can be closed?
Tested it and it works with heartbeats functionality.