kafka-docker
kafka-docker copied to clipboard
Docker compose health check
docker-compose now supports health checks (since version 1.10.0), and delaying start-up of containers until their dependencies are up and healthy. See https://docs.docker.com/compose/startup-order/ for docs on this.
It would be great with an example of how to configure such health checks on a kafka container!
Did you find a solution for this ?
doing something like this could work in the container :
healthcheck:
test: ["CMD", "bash", "-c", "unset" , "JMX_PORT" ,";" ,"kafka-topics.sh","--zookeeper","zookeeper:2181","--list"]
Don't know if there is a better way ... the unset is needed when you've enabled JMX.
We're facing this issue with Kubernetes. Our solution was to include a health-check.sh in our Dockerfile with the following, which goes to zookeeper and checks to see if the active brokers list returned contains the broker id of the node. It's not perfect - I'd rather directly query the node if it's up - but I don't see a way to do this and (at least in our case) Kubernetes is watching the Zookeeper cluster as well in a StatefulSet so it seems relatively safe to me.
If there's general interest and @wurstmeister is into it I'd be willing to make a PR for this.
#! /bin/bash
r=`$KAFKA_HOME/bin/zookeeper-shell.sh zk-headless:2181 <<< "ls /brokers/ids" | tail -1 | jq '.[]'`
ids=( $r )
function contains() {
local n=$#
local value=${!n}
for ((i=1;i < $#;i++)) {
if [ "${!i}" == "${value}" ]; then
echo "y"
return 0
fi
}
echo "n"
return 1
}
x=`cat $KAFKA_HOME/config/server.properties | awk 'BEGIN{FS="="}/^broker.id=/{print $2}'`
if [ $(contains "${ids[@]}" "$x") == "y" ]; then echo "ok"; exit 0; else echo "doh"; exit 1; fi`
@alfreddatakillen - The document you reference just explains how to wrap your startup process in an external script / shell call (https://docs.docker.com/v1.10/compose/startup-order/)
I assume you meant to reference https://docs.docker.com/compose/compose-file/#healthcheck - this was added in 1.12? However, this only reports the status of the container and does not block downstream dependencies. I believe this was mainly added for swarm support / restart policies
e.g.
version: '2.1'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
healthcheck:
test: ["CMD-SHELL", "echo ruok | nc -w 2 zookeeper 4444"]
interval: 5s
timeout: 10s
retries: 3
kafka:
build: .
depends_on:
- zookeeper
ports:
- "9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
This is only useful for reporting status of container. Kafka will be started immediately after Zookeeper regardless of 'healthy state'.
$ docker ps
CONTAINER ID IMAGE STATUS
6c6dee3e5aae kafkadocker_kafka Up About a minute
c2d8a40b785e wurstmeister/zookeeper Up About a minute (unhealthy)
@knordstrom - the healthcheck case may work OK for k8s if it was added to the docker image.
For future visitors that cannot get @knordstrom healthcheck to work, this is how I solved it:
Make sure that the shell script is added to a directory by your dockerfile.
We're using dynamic IDs generated by zookeeper, so I had to use following healthcheck.sh:
#! /bin/bash
unset JMX_PORT # https://github.com/wurstmeister/kafka-docker/issues/171#issuecomment-327097497
r=`$KAFKA_HOME/bin/zookeeper-shell.sh zookeeper:2181 <<< "ls /brokers/ids" | tail -1 | jq '.[]'`
ids=( $r )
function contains() {
local n=$#
local value=${!n}
for ((i=1;i < $#;i++)) {
if [ "${!i}" == "${value}" ]; then
echo "y"
return 0
fi
}
echo "n"
return 1
}
LOG_DIR=$(awk -F= -v x="log.dirs" '$1==x{print $2}' /opt/kafka/config/server.properties)
x=`cat ${LOG_DIR}/meta.properties | awk 'BEGIN{FS="="}/^broker.id=/{print $2}'`
if [ $(contains "${ids[@]}" "$x") == "y" ]; then echo "ok"; exit 0; else echo "doh"; exit 1; fi
Finally, add following code to your docker-compose:
healthcheck:
test: ["CMD-SHELL", "/bin/healthcheck.sh"]
interval: 5s
timeout: 10s
retries: 5
There's a very nice solution here: https://github.com/confluentinc/cp-docker-images/issues/358#issuecomment-356059519
There's a very nice solution here: confluentinc/cp-docker-images#358 (comment)
That is only for zookeeper, it doesn't help for checking when Kafka is up and ready.
Ah you are right, sorry. I must have mixed this issue up with another one.
@Lukkie @dobesv @matthew-d-jones @sscaling @knordstrom @ddewaele @alfreddatakillen
Hi,
Your solution seems good but I am unable to get it working unfortunately.
My kafka setup do not wait for the status to be OK in order to launch the commands !
kafka-server:
image: 'wurstmeister/kafka:2.12_2.5.0'
container_name: kafka-server
hostname: kafka-server
ports:
- '9092:9092'
- '29092:29092'
- '1099:1099'
volumes:
- '/c/Users/PHP/Desktop/imm/ping/healthcheck.sh:/bin/healthcheck.sh'
environment:
.....
depends_on:
- zookeeper-server
healthcheck:
test: ["CMD-SHELL", "/bin/healthcheck.sh"]
interval: 5s
timeout: 10s
retries: 5
kafka-setup:
image: 'wurstmeister/kafka:2.12_2.5.0'
hostname: kafka-setup
container_name: kafka-setup
command: "bash -c 'echo Waiting for Kafka to be ready... && \
./opt/kafka/bin/kafka-topics.sh --create --if-not-exists --zookeeper zookeeper-server:2181 --partitions 1 --replication-factor 1 --topic test1 && \
./opt/kafka/bin/kafka-topics.sh --create --if-not-exists --zookeeper zookeeper-server:2181 --partitions 1 --replication-factor 1 --topic test2'"
environment:
KAFKA_BROKER_ID: ignored
KAFKA_ZOOKEEPER_CONNECT: ignored
depends_on:
- kafka-server
When I check for Health Logs:
First time:
{"Status":"starting","FailingStreak":1,"Log":[{"Start":"2020-06-02T14:29:38.838704407Z","End":"2020-06-02T14:29:43.530036206Z","ExitCode":1,"Output":"cat:
can't open '/kafka/kafka-logs-kafka-server/meta.properties': No such file or directory\ndoh\n"}]}
And then:
{"Status":"healthy","FailingStreak":0,"Log":[{"Start":"2020-06-02T14:35:24.372947922Z","End":"2020-06-02T14:35:26.26390467Z","ExitCode":0,"Output":"ok\n"},
{"Start":"2020-06-02T14:35:31.271064187Z","End":"2020-06-02T14:35:33.22431355Z","ExitCode":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:38.230154493Z","En
d":"2020-06-02T14:35:40.243810443Z","ExitCode":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:45.252980835Z","End":"2020-06-02T14:35:47.20696075Z","ExitCode
":0,"Output":"ok\n"},{"Start":"2020-06-02T14:35:52.211854112Z","End":"2020-06-02T14:35:54.159074005Z","ExitCode":0,"Output":"ok\n"}]}
And the errors returned from my kafka-setup service:
Waiting for Kafka to be ready...
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
[2020-06-02 14:29:40,482] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
(kafka.admin.TopicCommand$)
Means that the commands launched before getting the OK status from Healthcheck.
How did you able to get it working please ?
Thank you all !
This seems to work ok for me for checking whether Kafka is up or not:
nc -z localhost 9091 || exit 1
(Note that I personally have Kafka running on port 9091, not 9092.)
It's basic but it seems to be good enough to keep my other containers from starting up before Kafka is actually ready to start receiving traffic.
This healthcheck works perfect for me:
image: bitnami/kafka:3.4.0
...
healthcheck:
test: kafka-cluster.sh cluster-id --bootstrap-server localhost:9092 || exit 1
interval: 1s
timeout: 60s
retries: 60
healthcheck:
test: ["CMD-SHELL", "pgrep -f 'kafka.*9101' || exit 1"]
interval: 2m
timeout: 10s
retries: 3
Also better option is to use
test: ["CMD-SHELL", "(echo > /dev/tcp/kafka1/9092) &>/dev/null && exit 0 || exit 1"]
Here is the solution for quay.io/debezium/kafka:2.5. Debezium won't install nc.
Tracking https://debezium.zulipchat.com/#narrow/stream/302529-community-general/topic/Kafka.20health.20check/near/433819156
kafka:
image: quay.io/debezium/kafka:2.5
ports:
- 9092:9092
depends_on:
zookeeper:
condition: service_healthy
environment:
- ZOOKEEPER_CONNECT=zookeeper:2181
healthcheck:
test: /kafka/bin/kafka-cluster.sh cluster-id --bootstrap-server kafka:9092 || exit 1
interval: 1s
timeout: 60s
retries: 60