micronaut-kafka
micronaut-kafka copied to clipboard
Kafka health checks and topics replication
Issue description
Hello,
We are using the Micronaut Kafka library, and have enabled the kafka health check, to mark our apps as unhealthy if our kafka cluster goes down
We have a kafka cluster with 3 brokers
Our kafka producers are configured with Acknowledge.ALL
We have the following configuration for our kafka brokers
offsets.topic.replication.factor: 3
min.insync.replicas: 2
According to the documentation in the kafka documentation for min.insync.replicas, a kafka producer with Acknowledge.ALL
should be able to write to a kafka cluster, as long as 2 insync replicas are acknowledging the write
However, the healthcheck in the KafkaHealthIndicator compares the offset.topic.replication.factor
to the number of available nodes
So in our case, with 3 brokers, a rollout restart of the kafka brokers will make one the kafka brokers unavailable Hence, the Kafka health check in the Micronaut Application fails when we rollout restart our kafka cluster (as only 2 nodes are healthy), even though our producers should be able to write to Kafka
Is there a reason to prefer the offsets.topic.replication.factor
over the min.insync.replicas
in the healthcheck?
I'm relatively new to Kafka, so there might totally be something I'm missing there
Having the same issue. Pretty critical issue for us, as health endpoint is used by K8S to check if pods are healthy or not. Had all of our pods go unhealthy because one Kafka node (out of 3) was restarting.
If that helps, we ended up writing our own health indicator by creating a class with a @Replaces(KafkaHealthIndicator.class)
annotation
The healtcheck class was trying to write a message to the kafka cluster, and would return unhealthy if the write fails a couple of times in a row
I would be happy to open a PR if the maintainers of the project are agreeing with the approach proposed above
@Patanouk PRs welcome