docker-maxscale
docker-maxscale copied to clipboard
Docker 1.12 services - After Galera scale down maxscale's autodiscovery runs into trouble
If you scale up and then scale down the galera cluster the auto discovery of maxscale runs into trouble.
The command used in entrypoint script:
getent hosts tasks.dbcluster
This delivers N cluster ip's from dbcluster correctly BUT:
If you do something like:
docker service scale dbcluster=10
and then:
docker service scale dbcluster=5
The instance list:
docker service ps dbcluster
shows something like:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
0s4hgq9tm28xmp3padelhq258 dbcluster.1 toughiq/mariadb-cluster doswa-5 Running Running 17 hours ago
3f2b2q0rs4i2yzy92ohue7dlq dbcluster.2 toughiq/mariadb-cluster doswa-4 Running Running 17 hours ago
2ks1kl7einrlnbzkh8aayz9oq \_ dbcluster.2 toughiq/mariadb-cluster doswa-4 Shutdown Shutdown 17 hours ago
0xgbr3q3wavzkk5bvagby8xyu dbcluster.3 toughiq/mariadb-cluster doswa-4 Running Running 17 hours ago
bdsbd10u203pjj2kyvawohw23 \_ dbcluster.3 toughiq/mariadb-cluster doswa-3 Shutdown Shutdown 17 hours ago
6m92mbed7hrc2w0cnwfn7c66d dbcluster.4 toughiq/mariadb-cluster doswa-5 Running Running 17 hours ago
9ky7bh2wewsqgx0pptzjkpaqm \_ dbcluster.4 toughiq/mariadb-cluster doswa-5 Shutdown Shutdown 17 hours ago
as90l1abljf8seojivtyu265y \_ dbcluster.4 toughiq/mariadb-cluster doswa-5 Shutdown Shutdown 17 hours ago
2ms4ilr6hbh9fovjixc1a0npi dbcluster.5 toughiq/mariadb-cluster doswa-5 Shutdown Shutdown 17 hours ago
aavba7zhv7y9z77vsgyaab03n \_ dbcluster.5 toughiq/mariadb-cluster doswa-4 Shutdown Shutdown 17 hours ago
d1in2lunlab6qfj3p0kbks288 dbcluster.6 toughiq/mariadb-cluster doswa-4 Shutdown Shutdown 17 hours ago
btm75qwpa8oi1fg07qkvnpf9t \_ dbcluster.6 toughiq/mariadb-cluster doswa-4 Shutdown Shutdown 17 hours ago
4ymbc2lwzf4dt1o7ooswilyrt dbcluster.7 toughiq/mariadb-cluster doswa-3 Running Running 17 hours ago
c60ahb1mmtbjjzut0z31v2o3v dbcluster.8 toughiq/mariadb-cluster doswa-3 Shutdown Shutdown 17 hours ago
1bk8o6eajfbwz668pkzv629g4 \_ dbcluster.8 toughiq/mariadb-cluster doswa-5 Shutdown Shutdown 17 hours ago
dc9j3annf9dn1aueo2n46i9lu dbcluster.9 toughiq/mariadb-cluster doswa-5 Shutdown Shutdown 17 hours ago
5ke252yv31v9rajzsr3x8n9uc dbcluster.10 toughiq/mariadb-cluster doswa-4 Shutdown Shutdown 17 hours ago
And the getent delivers in this case 5 cluster ip's also from instances in shutdown state. Unfortunately docker swarm seems not to clean up shuttet down instances. I'm currently not sure what is a good way around this.
Hi @Franselbaer, I saw similar problems with the cluster discovery itself. Sometimes, if you do scale-out and scale-in repeatedly, the new nodes wont find existing ones. Or the cluster might break apart, since not every node can reach all the other members. I am not sure if the problem is the Swarm DNS or the networking itself. Sometimes I had the overlay network attached to all nodes, but no communication over this net was possible. In my opinion this problem is caused by Swarm and its DNS itself. The only way to prevent this would be to establish some kind of alternative service discovery. But this would make the whole idea obsolete, since DNS and service discovery should be an environmental feature, provided by the cluster management, and just consumed by the client/containers. Which Docker version did you use when getting your results? I didnt try the current 1.12.3 version yet to see if this behavior still exists.
I am facing the same problem on 1.12.3
I've testet this only with 1.12.3 because i startet into Docker with this version.
@Franselbaer see https://github.com/docker/swarmkit/issues/1372
It's an old bug, but causing some critical cases. It is one of bugs, you cannot use docker in live services.
-
Deleting and changing network properties or name among swarm cluster, you can find your new network doesn't work properly. Old Created or Dead containers hold the network so that the old network to be preserved.
-
Depleting the resources of your machine.
Auto-discovery inside Swarm doesn't seem to work "at all" when the stack starts and there is a race condition between the cluster starting and MaxScale. Took me a while to figure this out.
Hi! Similar issue. toughiq/maxscale give error ERROR 1045 (28000): failed to create new session if one of toughiq/mariadb-cluster swarm nodes recreated.
Docker version 19.03.12, build 48a66213fe
I had the similar issue on Swarm mode when I scaled up and down the db container.
Even after I scaled up the containers back , all containers are on up and running status, I always get this error ERROR 1045 (28000): failed to create new session.
Result from maxadmin -pmariadb list servers shows all the nodes are down as well even they are running on docker. Checked on galera.cnf, the wsrep-cluster-address is not update to the latest nodes' IP address, which means the new created nodes wont find existing ones.
I also found that the galera service and the splitter listeners are all down. Can't find a way to manually restart the service and listener.
Any solution until now?