cp-docker-images
cp-docker-images copied to clipboard
Support swarm mode and swarm service scaling
This is a request to support Docker swarm mode (in 1.12+).
The current image can already be used with swarm mode i.e. you would create the service similar to the following. This creates services on specific nodes so that a given kafka service is started, it always starts on the same node with the same data volume.
docker network create --driver overlay --attachable zookeeper
docker network create --driver overlay --attachable kafka
# Tie kafka instances to specific nodes
docker node update --label-add kafka=1 node1
docker node update --label-add kafka=2 node2
docker node update --label-add kafka=3 node3
// create zookeeper-1, zookeeper-2, zookeeper-3 services here
docker service create \
--name kafka-1 \
--network kafka \
--network zookeeper \
--restart-condition on-failure \
--restart-max-attempts 3 \
--log-driver=json-file \
--constraint "node.role != manager" \
--constraint "node.labels.kafka == 1" \
--mount type=volume,src=kafka_vol,target=/var/lib/kafka/data \
--env KAFKA_BROKER_ID=1 \
--env KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 \
--env KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka-1:9092 \
confluentinc/cp-kafka:3.1.2
sleep 60
docker service create \
--name kafka-2 \
--network kafka \
--network zookeeper \
--restart-condition on-failure \
--restart-max-attempts 3 \
--log-driver=json-file \
--constraint "node.role != manager" \
--constraint "node.labels.kafka == 2" \
--mount type=volume,src=kafka_vol,target=/var/lib/kafka/data \
--env KAFKA_BROKER_ID=2 \
--env KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 \
--env KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka-2:9092 \
confluentinc/cp-kafka:3.1.2
sleep 60
docker service create \
--name kafka-3 \
--network kafka \
--network zookeeper \
--restart-condition on-failure \
--restart-max-attempts 3 \
--log-driver=json-file \
--constraint "node.role != manager" \
--constraint "node.labels.kafka == 3" \
--mount type=volume,src=kafka_vol,target=/var/lib/kafka/data \
--env KAFKA_BROKER_ID=3 \
--env KAFKA_ZOOKEEPER_CONNECT=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 \
--env KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka-3:9092 \
confluentinc/cp-kafka:3.1.2
This works quite well, but it does not support the docker swarm mode scale option, or similar capabilities like rolling updates, because each kafka instance is its own service e.g.:
docker service scale kafka=n
This is a request to support docker swarm mode and docker swarm scaling, which makes manageability a lot simpler. Some of the relevant problems that need to be solved are:
- naming kafka services uniquely for registry in zookeeper, but still allowing individual kafka services to be referenceable (the swarm-created DNS name
tasks.<servicename>will be useful for this) - giving each service a unique broker id
- supporting/testing shared volume mechanisms like flocker
I'm too am curious to see if Confluent has any plans to support swarm mode in the near future?
Hi @rocketraman and @FryDerm,
Right now, Confluent does not plan to officially support swarm in our Docker images. That said, we offer a hearty "go for it!"
Hi @rocketraman do you have the zookeeper portion that works, I am trying to install a 3 host clustered zookeeper and each zk cannot connect to election ports of the other two servers. The message is "WARN Cannot open channel to 3 at election address zookeeper3/10.0.0.2:43888" and its the same message for every server to its peers. Did you got it working? Thanks.
Hi @aayars If Confluent does not plan to officially support swarm, that means that: a. Kafka cluster should not be set on docker swarm b. Confluent is supporting kubernates cluster only c. Confluent is not supporting kafka clusters in any kind of container engine.
Thanks.
Hi @chopanpma,
It means for now, these are basic vanilla images. Out of the box, we're not yet providing anything fancy with regard to cluster management.
To integrate with specific deployment methods, it is possible to extend the images yourself. The choice of cluster management tool should not matter.
--Alex
@chopanpma I ended up switching from Docker Swarm Mode to Kubernetes. Swarm mode is too raw and I simply don't trust it yet for production orchestration. I did get it working on Kubernetes. I can share my Kubernetes resource descriptors if you think that might help...
Excellent, thanks, I managed to deploy zk and kafka in swarm mode.
On Jul 17, 2017 1:58 PM, "Alex Ayars" [email protected] wrote:
Hi @chopanpma https://github.com/chopanpma,
It means for now, these are basic vanilla images. Out of the box, we're not yet providing anything fancy with regard to cluster management.
To integrate with specific deployment methods, it is possible to extend the images yourself. The choice of cluster management tool should not matter.
--Alex
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/cp-docker-images/issues/213#issuecomment-315849064, or mute the thread https://github.com/notifications/unsubscribe-auth/ALRjVR6dklrwNTtDE31ENE9rSWu93Vlsks5sO67ugaJpZM4LxwxJ .
Thanks a lot Raman, It would be nice, even if I am not working with kubernates, I could learn, my swarm config is just a baby.
On Mon, Jul 17, 2017 at 2:35 PM, Raman Gupta [email protected] wrote:
@chopanpma https://github.com/chopanpma I did get it working on Kubernetes, yes. I can share my Kubernetes resource descriptors if you think that might help...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/cp-docker-images/issues/213#issuecomment-315858533, or mute the thread https://github.com/notifications/unsubscribe-auth/ALRjVfCkHMw0PjME6P8NCrTk0LAsUuSvks5sO7dogaJpZM4LxwxJ .
-- Eng. Francisco Castañeda Manager Montuoso Software Engineering Inc. e-mail:[email protected] www.montuoso.com
Raman Could you share your resource descriptors, I am having the problem again.
thanks.
On Mon, Jul 17, 2017 at 2:35 PM, Raman Gupta [email protected] wrote:
@chopanpma https://github.com/chopanpma I did get it working on Kubernetes, yes. I can share my Kubernetes resource descriptors if you think that might help...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/cp-docker-images/issues/213#issuecomment-315858533, or mute the thread https://github.com/notifications/unsubscribe-auth/ALRjVfCkHMw0PjME6P8NCrTk0LAsUuSvks5sO7dogaJpZM4LxwxJ .
-- Eng. Francisco Castañeda Manager Montuoso Software Engineering Inc. e-mail:[email protected] www.montuoso.com
@chopanpma Here you go with the Kubernetes deployment descriptors for Kafka and Zookeeper:
kafka.yaml: https://gist.github.com/rocketraman/6726ec0a26026a4ccc98c77966bb9030
zookeeper.yaml: https://gist.github.com/rocketraman/e58113d43ee48eac91f080b681a9fda3
@chopanpma can you share how to deploy the stack in swarm? thanks
This is very important - To be able to deploy Kafka / ZooKeeper in a cluster utilising Docker Stack files (Docker Swarm).
Also trying to get this running, whats the progress on that ? What values are you using ?
Has anyone been successful with a cluster in Swarm setup?
this is my config, and It has worked very good FQDN is fully qualified domain name of each server and WORKER is for the worker hostname as is in /etc/hosts
version: "3"
services:
zookeeper1:
hostname: zookeeper1
environment:
- ZOOKEEPER_SERVER_ID=1
- ZOOKEEPER_CLIENT_PORT=22181
- ZOOKEEPER_TICK_TIME=2000
- ZOOKEEPER_INIT_LIMIT=5
- ZOOKEEPER_SYNC_LIMIT=2
- ZOOKEEPER_SERVERS=0.0.0.0:22888
:23888;${FQDN4}:32888:33888;${FQDN5}:42888:43888
image: ${REGISTRY_NAME}/confluentinc/cp-zookeeper:3.2.1
ports:
- "22181:22181"
- "22888:22888"
- "23888:23888"
env_file: env/.zookeeper1.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER1}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/zookeeper-dev2/lib:/var/lib/zookeeper
- /opt/kafka-folders/zookeeper-dev2/log:/var/log/zookeeper
zookeeper2:
hostname: zookeeper2
environment:
- ZOOKEEPER_SERVER_ID=2
- ZOOKEEPER_CLIENT_PORT=32181
- ZOOKEEPER_TICK_TIME=2000
- ZOOKEEPER_INIT_LIMIT=5
- ZOOKEEPER_SYNC_LIMIT=2
- ZOOKEEPER_SERVERS=${FQDN3}:22888:23888;0.0.0.0:32888
:33888;${FQDN5}:42888:43888
image: ${REGISTRY_NAME}/confluentinc/cp-zookeeper:3.2.1
ports:
- "32181:32181"
- "32888:32888"
- "33888:33888"
#env_file: env/.zookeeper2.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER2}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/zookeeper-dev2/lib:/var/lib/zookeeper
- /opt/kafka-folders/zookeeper-dev2/log:/var/log/zookeeper
zookeeper3:
hostname: zookeeper3
environment:
- ZOOKEEPER_SERVER_ID=3
- ZOOKEEPER_CLIENT_PORT=42181
- ZOOKEEPER_TICK_TIME=2000
- ZOOKEEPER_INIT_LIMIT=5
- ZOOKEEPER_SYNC_LIMIT=2
-
ZOOKEEPER_SERVERS=${FQDN3}:22888:23888;${FQDN4}:32888:33888;0.0.0.0:42888 :43888
image: ${REGISTRY_NAME}/confluentinc/cp-zookeeper:3.2.1
ports:
- "42181:42181"
- "42888:42888"
- "43888:43888"
#env_file: env/.zookeeper3.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER3}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/zookeeper-dev2/lib:/var/lib/zookeeper
- /opt/kafka-folders/zookeeper-dev2/log:/var/log/zookeeper
kafka1:
hostname: kafka1
environment:
- KAFKA_ZOOKEEPER_CONNECT=${FQDN3}:22181,${FQDN4}:32181,${FQDN5}:42181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${FQDN3}:29092
- KAFKA_AUTO_CREATE_TOPICS_ENABLE=false
- KAFKA_DEFAULT_REPLICATION_FACTOR=3
- KAFKA_NUM_PARTITIONS=3
- KAFKA_DELETE_TOPIC_ENABLE=false
- KAFKA_NUM_NETWORK_THREADS=3
- KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE=false
image: ${REGISTRY_NAME}/confluentinc/cp-kafka:3.2.1
ports:
- "29092:29092"
#env_file: env/.kafka1.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER1}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/kafka-data:/var/lib/kafka/data
kafka2:
hostname: kafka2
environment:
- KAFKA_ZOOKEEPER_CONNECT=${FQDN3}:22181,${FQDN4}:32181,${FQDN5}:42181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${FQDN4}:39092
- KAFKA_AUTO_CREATE_TOPICS_ENABLE=false
- KAFKA_DEFAULT_REPLICATION_FACTOR=3
- KAFKA_NUM_PARTITIONS=3
- KAFKA_DELETE_TOPIC_ENABLE=false
- KAFKA_NUM_NETWORK_THREADS=3
- KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE=false
image: ${REGISTRY_NAME}/confluentinc/cp-kafka:3.2.1
ports:
- "39092:39092"
#env_file: env/.kafka2.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER2}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/kafka-data:/var/lib/kafka/data
kafka3:
hostname: kafka3
environment:
- KAFKA_ZOOKEEPER_CONNECT=${FQDN3}:22181,${FQDN4}:32181,${FQDN5}:42181
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${FQDN5}:49092
- KAFKA_AUTO_CREATE_TOPICS_ENABLE=false
- KAFKA_DEFAULT_REPLICATION_FACTOR=3
- KAFKA_NUM_PARTITIONS=3
- KAFKA_DELETE_TOPIC_ENABLE=false
- KAFKA_NUM_NETWORK_THREADS=3
- KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE=false
image: ${REGISTRY_NAME}/confluentinc/cp-kafka:3.2.1
ports:
- "49092:49092"
#env_file: env/.kafka3.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER3}]
restart_policy:
condition: on-failure
volumes:
- /opt/kafka-folders/kafka-data:/var/lib/kafka/data
schemaregistry:
hostname: schemaregistry
environment:
-
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=${FQDN3}:22181,${FQDN4}:32181,${FQDN5}:42181
- SCHEMA_REGISTRY_HOST_NAME=kafka-schema-registry
- SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081
image: ${REGISTRY_NAME}/confluentinc/cp-schema-registry:3.2.1
ports:
- "8081:8081"
#env_file: env/.schemaregistry.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER2}]
restart_policy:
condition: on-failure
restproxy:
hostname: restproxy
environment:
-
KAFKA_REST_ZOOKEEPER_CONNECT=${FQDN3}:22181,${FQDN4}:32181,${FQDN5}:42181
- KAFKA_REST_LISTENERS=http://0.0.0.0:8082
- KAFKA_REST_SCHEMA_REGISTRY_URL=http://${FQDN4}:8081
- KAFKA_REST_HOST_NAME=${FQDN4}
image: ${REGISTRY_NAME}/confluentinc/cp-kafka-rest:latest
ports:
- "8082:8082"
#env_file: env/.restproxy.env
networks:
- kafka
deploy:
replicas: 1
placement:
constraints: [node.hostname == ${WORKER2}]
restart_policy:
condition: on-failure
networks:
kafka:
volumes:
kafka-data:
On Fri, Jan 5, 2018 at 2:12 PM, Gudjon [email protected] wrote:
Has anyone been successful with a cluster in Swarm setup?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/cp-docker-images/issues/213#issuecomment-355639492, or mute the thread https://github.com/notifications/unsubscribe-auth/ALRjVRpQmFlwN5E53dzYNGQX1e0T-HBtks5tHnPagaJpZM4LxwxJ .
-- Eng. Francisco Castañeda Manager Montuoso Software Engineering Inc. e-mail:[email protected] www.montuoso.com
@chopanpma This is way too complex.
@gaui Try this: https://github.com/wurstmeister/kafka-docker/blob/master/docker-compose-swarm.yml
@GoogerBooger Thanks, but this is just a one instance of each. I want to run in a High Available cluster.
I currently do a kind of poor man's scaling by using jinja2 templates and rendering a docker-compose.yml from a shell script. The part of the template looks as follows:
{% for zookeeper_id in range(zookeeper_instances) %}
zookeeper{{ zookeeper_id }}:
image: {{ docker_registry_prefix or '' }}confluentinc/cp-zookeeper
hostname: zookeeper{{ zookeeper_id }}
{% if zookeeper_expose %}
ports:
{% if zookeeper_jmx %}
- {{ zookeeper_jmx_port + zookeeper_id }}:{{ zookeeper_jmx_port + zookeeper_id }}
{% endif %}
- {{ zookeeper_expose_port }}:{{ zookeeper_client_port }}
{% endif %}
environment:
ZOOKEEPER_SERVER_ID: {{ zookeeper_id + 1 }}
ZOOKEEPER_CLIENT_PORT: {{ zookeeper_client_port }}
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 5
ZOOKEEPER_SYNC_LIMIT: 2
ZOOKEEPER_SERVERS: zookeeper0:2888:3888{% for zookeeper_id in range(1,zookeeper_instances) %};zookeeper{{ zookeeper_id }}:2888:3888{% endfor %}
{% if zookeeper_jmx %}
KAFKA_JMX_PORT: {{ zookeeper_jmx_port + zookeeper_id }}
KAFKA_JMX_HOSTNAME: localhost
{% endif %}
volumes:
- 'zookeeper{{ zookeeper_id }}_data:/var/lib/zookeeper/data'
- 'zookeeper{{ zookeeper_id }}_log:/var/lib/zookeeper/log'
- 'zookeeper{{ zookeeper_id }}_secrets:/etc/zookeeper/secrets'
healthcheck:
test: 'if [ "$$(echo ruok | nc 127.0.0.1 2181)" = "imok" ]; then exit 0; fi; exit 1'
interval: {{ healthcheck_interval }}
timeout: {{ healthcheck_timeout }}
retries: {{ healthcheck_retries }}
{% endfor %}
{# Kafka #}
{% for kafka_id in range(kafka_instances) %}
kafka{{ kafka_id }}:
image: {{ docker_registry_prefix or '' }}confluentinc/cp{% if kafka_enterprise %}-enterprise{% endif %}-kafka
hostname: kafka{{ kafka_id }}
depends_on:
{% for zookeeper_id in range(zookeeper_instances) %}
- zookeeper{{ zookeeper_id }}
{% endfor %}
{% if kafka_expose %}
ports:
{% if kafka_jmx %}
- {{ kafka_jmx_port + kafka_id }}:{{ kafka_jmx_port + kafka_id }}
{% endif %}
- {{ kafka_expose_port }}:9092
{% endif %}
environment:
KAFKA_BROKER_ID: {{ kafka_id + 1 }}
KAFKA_ZOOKEEPER_CONNECT: zookeeper0:{{ zookeeper_client_port }}{% for zookeeper_id in range(1,zookeeper_instances) %},zookeeper{{ zookeeper_id }}:{{ zookeeper_client_port }}{% endfor %}
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka{{ kafka_id }}:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: {{ [kafka_instances, 3] | min }}
{% if kafka_jmx %}
KAFKA_JMX_PORT: {{ kafka_jmx_port + kafka_id }}
KAFKA_JMX_HOSTNAME: localhost
{% endif %}
volumes:
- 'kafka{{ kafka_id }}_data:/var/lib/kafka/data'
- 'kafka{{ kafka_id }}_secrets:/etc/kafka/secrets'
healthcheck:
test: 'bash -c ''S=$$(cat /dev/urandom | tr -dc "a-zA-Z0-9" | fold -w 4 | head -n 1); [ "$$(printf "\x00\x00\x00\x0b\x00\x12\x00\x00$${S}\x00\x01\x00\n" | nc -q 1 localhost 9092 | head -c8 | tail -c4)" = "$${S}" ]''; exit $${?}'
interval: {{ healthcheck_interval }}
timeout: {{ healthcheck_timeout }}
retries: {{ healthcheck_retries }}
{% endfor %}
Have a look at the Kafka health check btw ;) Scaling is done in a second yaml file like this:
zookeeper_instances: 4
Before running docker-compose up, you have to generate your docker-compose.yml with a shell script like this:
#!/bin/bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
if [ -z "${1}" ]; then
echo "usage: docker-compose-generate.sh CONFIG_FILE"
exit 1
fi
j2 "${DIR}/docker-compose.yml.j2" "${1}" | sed 's/[[:space:]]*$//g' | sed '/^$/d' > "${DIR}/docker-compose.yml"
Scaling with --scale works for Kafka – somehow when you create the containers one by one. Simply starting all from the beginning will make them get all the same ID. That's why I use the same mechanism for Kafka.
But yes – that sucks and I wish we could just --scale like usual. For example, elasticsearch and cassandra don't require any magic like this.
I will eventually figure out how to do the Zookeeper, but since doing dynamic ZK is not easy yet, I had to do a "hardcode" on the compose file, not in the reference below. I've tried doing "{{.Task.Slot}}" apart of the hostname but it would not resolve and I think that's an issue with Docker Swarm.
So in order for this to work, I had to update the "run.sh" that Confluent provides to be a bit smarter about naming when talking to the Zookeeper. The Advertised Listeners where all of this is taken advantage of, so you may need to modify for SSL, but since my kafka is only being used internal to service, it's not priority at the moment.
If you need specific constraints for disk, then consider pre-setup of labels, the docker tags system is neat of docker stack deploy.
Compose-file
kafka:
image: custom_kafka:latest
build:
context: .
dockerfile: ./.docker/kafka/Dockerfile
deploy:
replicas: 3
restart_policy:
condition: on-failure
placement:
preferences:
- spread: node.labels.worker
environment:
AWS_DEPLOYMENT: "false"
KAFKA_BROKER_ID: "{{.Task.Slot}}"
KAFKA_ZOOKEEPER_CONNECT: "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_NUM_PARTITIONS: 3
depends_on:
- zookeeper1
- zookeeper2
- zookeeper3
ports:
- 9092-9094:9092-9094
networks:
- kafka
Dockerfile
FROM confluentinc/cp-kafka:4.0.0
COPY ./.docker/kafka/smart_run.sh /etc/confluent/docker/
RUN chmod ag+x /etc/confluent/docker/smart_run.sh
CMD ["/etc/confluent/docker/smart_run.sh"]
smart_run.sh
if [[ ${AWS_DEPLOYMENT} == true || ${AWS_DEPLOYMENT} == 'True' || ${AWS_DEPLOYMENT} == 'true' ]];
then
export LOCAL_IP=`curl http://169.254.169.254/latest/meta-data/local-ipv4`
export PUBLIC_IP=`curl http://169.254.169.254/latest/meta-data/public-ipv4`
else
export LOCAL_IP=`hostname -i`
export PUBLIC_IP=`hostname`
fi
if [[ ! ${KAKFA_ADVERTISED_LISTENERS:-} ]];
then
export KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=LOCAL:PLAINTEXT,PUBLIC:PLAINTEXT,PLAINTEXT:PLAINTEXT
export KAFKA_ADVERTISED_LISTENERS=LOCAL://${LOCAL_IP}:9092,PUBLIC://${PUBLIC_IP}:9093,PLAINTEXT://`hostname -i`:9094
fi
exec /etc/confluent/docker/run
@gaui Try this: https://github.com/wurstmeister/kafka-docker/blob/master/docker-compose-swarm.yml
Combine with this: https://github.com/itsaur/zookeeper-docker-swarm makes the solution complete
Compare to k8s, docker swarm requires less resources, less opinionated, definitely far less components to watch out for. k8s is suitable for co. like Google; global scale, deep pocket with infinite computing resources. But how many of us actually needs such complexity?