emqx-docker
emqx-docker copied to clipboard
Can't create a Docker Swarm Cluster
- Docker version 17.05.0-ce (for arm)
- EMQ Version v2.2-rc.1
Hello,
I am running docker in swarm mode and want to deploy a MQTT cluster. I decided to create one master instance so that other replicated instances can join it. Here is the compose file I wrote for this goal:
version: "3"
services:
emq-master:
image: aksakalli/rpi-emq
environment:
- EMQ_HOST=emq-master
- EMQ_NAME=master
- EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492
- EMQ_NODE__NAME=master@emq-master
emq-worker:
image: aksakalli/rpi-emq
depends_on:
- emq-master
ports:
- 18083:18083
- 1883:1883
deploy:
replicas: 2
environment:
- EMQ_JOIN_CLUSTER=master@emq-master
- EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492
(I am using my own image for Raspberry Pi , it is basically the same as emqtt/emq-docker
but compiled for arm)
When I deploy this stack, I am getting following log for emq-master container:
starting emqttd on node 'master@emq-master'
emqttd ctl is starting...[ok]
emqttd hook is starting...[ok]
emqttd router is starting...[ok]
emqttd pubsub is starting...[ok]
emqttd stats is starting...[ok]
emqttd metrics is starting...[ok]
emqttd pooler is starting...[ok]
emqttd trace is starting...[ok]
emqttd client manager is starting...[ok]
emqttd session manager is starting...[ok]
emqttd session supervisor is starting...[ok]
emqttd wsclient supervisor is starting...[ok]
emqttd broker is starting...[ok]
emqttd alarm is starting...[ok]
emqttd mod supervisor is starting...[ok]
emqttd bridge supervisor is starting...[ok]
emqttd access control is starting...[ok]
emqttd system monitor is starting...[ok]
Load emq_mod_presence module successfully.
Load emq_mod_subscription module successfully.
dashboard:http listen on 0.0.0.0:18083 with 2 acceptors.
mqtt:tcp listen on 127.0.0.1:11883 with 16 acceptors.
mqtt:tcp listen on 0.0.0.0:1883 with 64 acceptors.
mqtt:ws listen on 0.0.0.0:8083 with 16 acceptors.
mqtt:ssl listen on 0.0.0.0:8883 with 32 acceptors.
mqtt:wss listen on 0.0.0.0:8084 with 4 acceptors.
mqtt:api listen on 127.0.0.1:8080 with 4 acceptors.
emqttd 2.2 is running now
Node 'master@emq-master' not responding to pings.
['2017-06-29T09:17:55Z']:waiting emqttd
['2017-06-29T09:17:55Z']:timeout error
Apparently, master@emq-master
can not be resolved within the container when I set EMQ_HOST
.
I also tried to leave it blank, emqttd can be created for the default ip address (as [email protected]
). However, emq-worker
containers can not join the cluster (even though emq-master
host(FQDN) can be resolved by these containers.) The logs from one of emq-worker container:
emqttd 2.2 is running now
['2017-06-29T11:18:50Z']:emqttd start
['2017-06-29T11:18:50Z']:emqttd try join master@emq-master
11:18:58.790 [error] ** System running to use fully qualified hostnames **
** Hostname emq-master is illegal **
Failed to join the cluster: {node_not_running,'master@emq-master'}
I connected to one of the worker containers and tried to connect to the master with the hostname again:
root@9df3d2a3d36a:/opt/emqttd/bin# emqttd_ctl cluster join master@emq-master
Failed to join the cluster: {node_not_running,'master@emq-master'}
And this time, using the ip address, it worked!
root@9df3d2a3d36a:/opt/emqttd# emqttd_ctl cluster join [email protected]
Join the cluster successfully.
Cluster status: [{running_nodes,['[email protected]','[email protected]']}]
I was planning to set a static ip for my master node, however swarm's overlay network driver does not support it (see Static/Reserved IP addresses for swarm services · Issue #24170 · moby/moby).
How can I create a emq cluster deployment properly?
I also tried adding hostname parameter for emq-master
but didn't work either:
services:
emq-master:
hostname: emq-master
...
@aksakalli The Erlang node name should be Name@Host
when clustering, where Host is IP address or the fully qualified host name. For example:
services:
emq-master:
image: aksakalli/rpi-emq
environment:
- EMQ_HOST=master.yourdomain
- EMQ_NAME=emq
- EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492
- [email protected]
@aksakalli, I found the only way I could get the brokers up in clustered mode was if I specified FQDNs. Short hostnames didn't work and since IPs in Docker are dynamic, can't use those either. I assign the EMQ_HOST variable with an FQDN and then set the network alias for that container to the same FQDN. Here's the snippet from my compose file I use to bring up the EMQ services:
services:
emq_main_1:
image: emq
environment:
EMQ_NAME: emq
EMQ_HOST: emq_main_1.mq.tt
networks:
backend:
aliases:
- emq_main_1.mq.tt
emq_main_2:
image: emq
environment:
EMQ_NAME: emq
EMQ_HOST: emq_main_1.mq.tt
EMQ_JOIN_CLUSTER: emq@emq_main_1.mq.tt
networks:
backend:
aliases:
- emq_main_2.mq.tt
@MrOwen thank you very much, it works with network aliases!
One thing to point out: emq_main_2
's EMQ_HOST
should be emq_main_2.mq.tt
in your snippet.
Here is my compose file:
version: "3"
services:
emq-master:
image: emq
environment:
- "EMQ_NAME=emq"
- "EMQ_HOST=master.mq.tt"
- "EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492"
networks:
emq-cluster:
aliases:
- master.mq.tt
ports:
- 18083:18083
- 1883:1883
emq-worker:
image: emq
environment:
- "[email protected]"
- "EMQ_NODE__COOKIE=ef16498f66804df1cc6172f6996d5492"
depends_on:
- emq-master
networks:
emq-cluster:
deploy:
replicas: 2
networks:
emq-cluster:
Now I can run my cluster with 3 instances, it works fine:
Now I publish the cluster from the master instance.
My questions are:
- I publish everything from the master because I don't want the load balancer rout requests to the workers before they join the cluster. Is this the right approach? How can I possibly improve this for high availability.
- Since I have the dashboard from
emq-master
, do I need to load all default modules foremq-worker
? Can I addEMQ_LOADED_PLUGINS=""
variable foremq-worker
?
We have a script hook in https://github.com/emqtt/emq-docker/blob/master/start.sh#L151
You could create this script and do something in it about cluster.
@aksakalli 1.- Its a fact of life that some clients will need to re-connect / wait to connect, there's no way to avoid this. By following this approach you're only ever using the master to handle connections / sessions from external clients of which surely there are much more than those connecting from inside the cluster, thus, mostly negating the main benefit of clustering (spreading the load) in the first place, I'd think... 2.- I've found (as I'm sure you have by now) that dashboards only show information regarding their own instance, they do not reflect the whole cluster...
On a general note, this clustering method is still weak in the face of a master being unavailable when a worker connects, something that constantly re-attaches workers to the master (or a completely different approach) would be needed.
@aksakalli
Please have a look at the 2.3 beta version of EMQ. It adds autodiscovery. http://emqttd-docs.readthedocs.io/en/latest/config.html#emq-cluster https://github.com/emqtt/emqttd/blob/v2.3-beta.1/etc/emq.conf#L12
I tried both multicast and etcd, and they both work (had to manually create the node dir for etcd).
Just change ENV EMQ_VERSION=v2.3-beta.1
in the Dockerfile
and then start the containers with the following arguments:
Etcd:
# Create '/emq/emq/nodes' directory in your Etcd cluster. Python example using python-etcd:
>>> import etcd
>>> c = etcd.Client(host='ETCD_HOST', port=2379)
>>> c.write('/emq/emq/nodes', None, dir=True)
docker run --rm -ti \
-p 18083:18083 \
-p 1883:1883 \
-p 8083:8083 \
--env "EMQ_CLUSTER__NAME=emq" \
--env "EMQ_CLUSTER__DISCOVERY=etcd" \
--env "EMQ_CLUSTER__AUTOHEAL=on" \
--env "EMQ_CLUSTER__AUTOCLEAN=3m" \
--env "EMQ_CLUSTER__ETCD__SERVER=http:\/\/ETCD_HOST:2379" \
--env "EMQ_CLUSTER__ETCD__PREFIX=emq" \
--env "EMQ_CLUSTER__ETCD__NODE_TTL=1m" \
YOUR-REPO-HERE/emq:2.3-beta
# Run the same, but skip /change the ports for consecutive nodes.
Multicast:
docker run --rm -ti \
-p 18083:18083 \
-p 1883:1883 \
-p 8083:8083 \
--env "EMQ_CLUSTER__NAME=emq" \
--env "EMQ_CLUSTER__DISCOVERY=mcast" \
--env "EMQ_CLUSTER__AUTOHEAL=on" \
--env "EMQ_CLUSTER__AUTOCLEAN=3m" \
--env "EMQ_CLUSTER__MCAST__ADDR=239.192.0.1" \
--env "EMQ_CLUSTER__MCAST__PORTS=4369,4370" \
--env "EMQ_CLUSTER__MCAST__IFACE=0.0.0.0" \
--env "EMQ_CLUSTER__MCAST__TTL=255" \
--env "EMQ_CLUSTER__MCAST__LOOP=on" \
YOUR-REPO-HERE/emq:2.3-beta
# Run the same, but skip /change the ports for consecutive nodes.
I hope the above helps.
I had to make a few tweaks to bring a cluster up using DNS auto discovery and docker swarm:
version: "3"
services:
mqtt:
networks:
proxy:
mqtt:
default:
aliases:
- mymqtt
deploy:
replicas: 12
ports:
- 1883:1883 # MQTT
image: chrisns/emq:v2.3-beta.3-hacked
environment:
- EMQ_CLUSTER__DNS__NAME=tasks.mymqtt
- EMQ_NAME=emq
- EMQ_CLUSTER__DISCOVERY=dns
- EMQ_CLUSTER__AUTOHEAL=on
- EMQ_CLUSTER__AUTOCLEAN=30s
- EMQ_CLUSTER__DNS__APP=emq
networks:
default:
external: false
mqtt:
external: true
proxy:
external: true
docker stack deploy -c docker-compose.yml mqtt
The main thing that wasn't working that needed to be hacked was the IP address determination in the start.sh
is way to simple. My script figures out what IP of the container is on the aliased network and uses that for self identification and communication between the containers, though my solution is a bit specific for DNS based
Aside from that it's annoying that the default emq.conf
has lines commented out, so to maintain the nice env var replacer thing in the in built start.sh
you have to remove the #
's
In other related news I built a thing that automagically builds+pushes docker images for all the releases and a -hacked
with my patches https://hub.docker.com/r/chrisns/emq/tags/
Code is here: https://github.com/chrisns/docker-emq
This is super self serving and not really sensible enough for me to make a PR with any of it, but hopefully sharing my solution/hacks will help someone :)
@chrisns I had run into the IP issue before and had settled on assigning a specific subnet to the overlay network to be used for the cluster and a custom variable to signal its prefix for matching with the available addresses inside the container (which is way more complicated)... but, yeah, something to aid the process into choosing the "right" network to get its "name" of off is needed. This actually looks fine, except that an extra variable (not related to an specific clustering solution) might be needed. The replacer works fine for commented lines, it's just that the regex is not correctly matching whitespace, I submitted a patch for it but it got rolled back later on...
I eventually decided to abandon work on this for now. If the cluster comes up too fast the nodes don't discover each other, or worse they discover some other nodes, so you can end up with clusters I found spinning up 12 containers could easily result in a cluster of 4, another cluster of 5 and then 3 unclustered nodes. -- which is pretty annoying/pointless, really hoped the auto discovery thing would run all the time not just at startup
Anyone has some news about this issue?
This does not seem to work, if you provide the DNS it will resolve to another IP, the load balancer most likely and not the node ip.
What I did is I mounted docker.sock and I got the Ips from there using python, and used cluster.sh to try and join manually the ips from there.
I hope in the future the developers will consider a viable solution for Docker, because mcast does not work with overlay and also etcd is not a good solution.
@purplesrl Can you share your solution? I'm looking for a good solution which allows me use dockerized emq clusters in Amazon ECS.
Has someone figured out a way to create a docker swarm/docker-compose cluster in emqx version 3? I have tried some of the suggested ways here and haven't found a solution yet.
@optionsome may I point you to -> https://github.com/emqx/emqx-docker/pull/91#issue-233811388 ?
@purplesrl Can you share your solution? I'm looking for a good solution which allows me use dockerized emq clusters in Amazon ECS.
@tomaszwostal Unfortunately not, the code I developed I made at work... but I outlined the steps, the idea is to find the IPs and then join the nodes manually, because on docker the automatic way is not working mainly because docker swarm provides a load-balancer IP but emqx requires the actual IP of the node.
@RaymondMouthaan thanks a lot! I was able to get the clustering to work. I don't know what my problem was earlier as what I was trying was really similar to your solution. Was just missing the hostname and volume definitions.
@optionsome, good to hear you made it work 👍🏽. One note to this is -- when emqx-worker is started faster than emqx-master, you might end up with two individual emqx instances, instead of clustered ones. Solution : just restart the worker container
@RaymondMouthaan I copy your example of a docker compose file and run it,but it doesn't work,it doesn't clustered. I restarted the worker container,It's still the same.Did I do anything wrong?Look forward to your reply
@Rebellioncry, apologises but i am no longer using emqx as mqtt broker for a while now. @zhanghongtong might be able to help you.
@RaymondMouthaan Thank you for your reply! @zhanghongtong How to create a docker swarm cluster now? Can you give me an example?Look forward to your reply
@Rebellioncry Hi, An example of docker-compose.yaml
is as follows
version: '3'
services:
emqx1:
image: emqx/emqx:v3.2.5
environment:
- "EMQX_NAME=emqx"
- "EMQX_HOST=node1.emqx.io"
- "EMQX_CLUSTER__DISCOVERY=static"
- "[email protected], [email protected]"
networks:
emqx-net:
aliases:
- node1.emqx.io
emqx2:
image: emqx/emqx:v3.2.5
environment:
- "EMQX_NAME=emqx2"
- "EMQX_HOST=node2.emqx.io"
- "EMQX_CLUSTER__DISCOVERY=static"
- "[email protected], [email protected]"
networks:
emqx-net:
aliases:
- node2.emqx.io
networks:
emqx-net:
Execute docker-compose up
$ docker-compose up
Creating tmp_emqx1_1 ... done
Creating tmp_emqx2_1 ... done
Attaching to tmp_emqx2_1, tmp_emqx1_1
emqx1_1 | node.max_ports=1048576
emqx2_1 | node.max_ports=1048576
emqx2_1 | listener.tcp.external.acceptors=64
emqx2_1 | listener.ssl.external.acceptors=32
emqx2_1 | node.process_limit=2097152
emqx2_1 | node.max_ets_tables=2097152
emqx2_1 | cluster.discovery=static
emqx2_1 | cluster.discovery=static
emqx2_1 | listener.ws.external.acceptors=16
emqx2_1 | [email protected]
emqx2_1 | [email protected], [email protected]
emqx2_1 | [email protected], [email protected]
emqx1_1 | listener.tcp.external.acceptors=64
emqx1_1 | listener.ssl.external.acceptors=32
emqx1_1 | node.process_limit=2097152
emqx1_1 | node.max_ets_tables=2097152
emqx1_1 | cluster.discovery=static
emqx1_1 | cluster.discovery=static
emqx1_1 | listener.ws.external.acceptors=16
emqx1_1 | [email protected]
emqx1_1 | [email protected], [email protected]
emqx1_1 | [email protected], [email protected]
emqx2_1 | emqx v3.2.5 is started successfully!
emqx1_1 | emqx v3.2.5 is started successfully!
emqx2_1 | 2019-11-19 13:14:57.259 [critical] [EMQ X] emqx shutdown for join
emqx2_1 | ['2019-11-19T13:15:00Z']:emqx start
emqx1_1 | ['2019-11-19T13:15:01Z']:emqx start
$ docker exec -it tmp_emqx1_1 sh -c "emqx_ctl cluster status"
Cluster status: [{running_nodes,['[email protected]','[email protected]']}]
@zhanghongtong thanks a lot!Your example works well!
@Rebellioncry You are welcome :)
@aaamitsingh I'm sorry we don't have an example yet
@zhanghongtong Still only possible Autocluster by static node list?
Hi @renatomotorline, you can refer to our documentation
@zhanghongtong I read the documentation but I only successfully make the cluster works with static node list like the example that you put above, you have any example with dns, multicast or etcd?
@renatomotorline Sorry, we don't have an example of DNS, multicast and etcd clusters