docker-machine-driver-xhyve
docker-machine-driver-xhyve copied to clipboard
docker 1.12 swarm mode ingress load balancing partially working
First of all, this seems to be mostly working for the basic docker use cases (although I confess I haven't tried everything).
After creating a swarm (using docker 1.12 docker swarm init and docker swarm join... commands) of 3 machines (1 manager, 2 workers, all using the xhyve driver), published ports appear to only be accessible from the nodes on which container instances are running.
This is compared to an identical swarm cluster created with the virtualbox driver, where the service is accessible on the published port of all 3 machines.
Steps to reproduce:
# create the swarm cluster
docker-machine create -d xhyve x-master-1
eval $(docker-machine env x-master-1)
docker swarm init --advertise-addr $(docker-machine ip x-master-1)
export manager_token=$(docker swarm join-token -q manager)
docker-machine create -d xhyve x-node-1
eval $(docker-machine env x-node-1)
docker swarm join --token ${manager_token} $(docker-machine ip x-master-1)
docker-machine create -d xhyve x-node-2
eval $(docker-machine env x-node-2)
docker swarm join --token ${manager_token} $(docker-machine ip x-master-1)
# deploy service
eval $(docker-machine env x-master-1)
docker service create --name hello -p 8080:80 nginx
# ... wait for docker service ls to show all replicas available
# test that service is ingress-balanced from all machines
curl -s "http://$(docker-machine ip x-master-1):8080/" | grep "Welcome" || echo "not accessible"
curl -s "http://$(docker-machine ip x-node-1):8080/" | grep "Welcome" || echo "not accessible"
curl -s "http://$(docker-machine ip x-node-2):8080/" | grep "Welcome" || echo "not accessible"
Further exploration (netstat -tnlp on each node) shows that there's no listener for 8080 on the machines where instance containers for that service are not running.
netstat -tnlp (xhyve nodes where instance containers running, and ALL virtualbox nodes)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 :::8080 :::* LISTEN -
tcp 0 0 :::22 :::* LISTEN -
tcp 0 0 :::2376 :::* LISTEN -
tcp 0 0 :::7946 :::* LISTEN -
netstat -tnlp (xhyve nodes where instance containers NOT running)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 :::22 :::* LISTEN -
tcp 0 0 :::2376 :::* LISTEN -
tcp 0 0 :::7946 :::* LISTEN -
I also noticed that running docker service ps hello shows that the nodes are all named 'boot2docker' in the xhyve case, but in the virtualbox case, they are named correctly.
docker service ps hello (virtualbox)
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
0vad3i6m78pwlpmy036w6cibh hello.1 nginx v-master-1 Running Running 38 minutes ago
el0idy3uz4s8djwnlxe04iyhq hello.2 nginx v-node-2 Running Running 38 minutes ago
docker service ps hello (xhyve)
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
504oows46cwv92n54cmml6955 hello.1 nginx boot2docker Running Running 32 minutes ago
cn3ymzygp4fxst7oyray38o7a hello.2 nginx boot2docker Running Running 32 minutes ago
@matt-deboer Thanks for issue :) Okay, I will debug it.
Just in case(sorry, I'm not good at English), this problem means xhyve does not run :8080 only? or have any other problem?
Let me try to simplify the issue; maybe you're familiar with the new docker swarm mode networking (I'm just barely familiar with it myself)--which uses something called "ingress load-balancing", which allows a service to be accessed from a published port (-p host:container syntax) on any of the machines in the cluster.
This works correctly on the virtualbox provider, but on the xhyve provider, the service is only accessible on a hosts where an instance of that service is running. The example host port I chose for the 'hello' service was 8080, mapped to 80 of the nginx container.
I compared the startup logs /var/log/docker.log on one of the xhyve machines against one of the virtualbox machines, and I noticed the following line, which I believe is the root of the problem:
time="2016-08-17T15:59:52.588760873Z" level=warning msg="2016/08/17 15:59:52 [ERR] memberlist: Conflicting address for boot2docker. Mine: 192.168.64.128:7946 Theirs: 192.168.64.16:7946\n"
It seems that the machines all having the same hostname (boot2docker) is not allowing them to be distinguished from each other in the swarm overlay network.
I tested this by manually setting the hostname on each of the nodes before running the swarm init/join commands, and this fixes the issue. So it looks like maybe just a change to update /etc/hostname on create would do the trick...
Also, I created a script to quickly test here: https://gist.github.com/matt-deboer/3b81462f795166d736d91ca5be0a4e65
@matt-deboer Thanks for details :) I will try to debug it.
But I'm still not trying to docker swarm... I'm should learn it. this script seems to helpful for learn to swarm or etc. Thanks. Please wait a moment.