Openstack floating IP race
docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-587241eb-1490702504-2a286bb1 - openstack Running tcp://172.27.93.78:2376 v1.12.6
runner-587241eb-1490702504-6c323aaf - openstack Running tcp://172.27.93.78:2376 v1.12.6
runner-587241eb-1490702504-5024f19b - openstack Running tcp://172.27.93.78:2376 v1.12.6
the OpenStack driver doesn't seem to check the return code from attaching a floating IP. As a result, running multiple docker-machine create in parallel often causes instances to report having the same floating IP.
Expected behaviour: Docker-machine notices the instance failed to attach the floating IP and tries again.
Actual behaviour: Docker-machine doesn't notice and carries on, eventually failing with Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
Version: docker-machine version 0.10.0, build 76ed2a6
docker-machine create --driver openstack --openstack-availability-zone nova --openstack-flavor-id 2002 --openstack-image-name xenial-gitlab-slave --openstack-net-name gitlab-autoscale --openstack-floatingip-pool nova --openstack-sec-groups ssh,default,icmp,autoscale --openstack-ssh-user "ubuntu" test1 > test1.txt | docker-machine create --driver openstack --openstack-availability-zone nova --openstack-flavor-id 2002 --openstack-image-name xenial-gitlab-slave --openstack-net-name gitlab-autoscale --openstack-floatingip-pool nova --openstack-sec-groups ssh,default,icmp,autoscale --openstack-ssh-user "ubuntu" test2 > test2.txt | docker-machine create --driver openstack --openstack-availability-zone nova --openstack-flavor-id 2002 --openstack-image-name xenial-gitlab-slave --openstack-net-name gitlab-autoscale --openstack-floatingip-pool nova --openstack-sec-groups ssh,default,icmp,autoscale --openstack-ssh-user "ubuntu" test3 > test3.txt
returned
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
and docker-machine ls gave
test1 - openstack Running tcp://172.27.94.57:2376 v1.12.6
test2 - openstack Running tcp://172.27.94.57:2376 v1.12.6
test3 - openstack Running tcp://172.27.94.57:2376 v1.12.6
These errors should be prevented by checking the return code from floating-ip associate or by trying to attach again if ssh <IP> returns Permission denied (publickey).
This error prevents using this for auto scaled runners for gitlab-ci. :/
(when using anything >1 for IdlecCount https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-machine-section) Example:
root@gitlab-runner:/etc/gitlab-runner# cat config.toml
concurrent = 10
check_interval = 0
[[runners]]
name = "shared_runner_small_onetmie"
url = "https://gitlab.....de/"
token = "ddcbe4f3a2b2df9da4adb1cceb494e"
executor = "docker+machine"
[runners.docker]
tls_verify = false
image = "ubuntu:16.04"
privileged = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
[runners.cache]
[runners.machine]
IdleCount = 5
IdleTime = 86400
MaxBuilds = 1
MachineDriver = "openstack"
MachineName = "gitlabci-small-onetime%s"
MachineOptions = ["openstack-auth-url=https://keystone.cloud.....net:5000/v3", "openstack-region=dbl", "openstack-tenant-name=...-gitlabci", "openstack-net-name=internal", "openstack-floatingip-pool=ext-net", "openstack-domain-name=Default", "openstack-password=...", "openstack-username=service-gitlab-ci", "openstack-flavor-name=m1.small", "openstack-image-name=Ubuntu Server 16.04 LTS", "openstack-sec-groups=ssh,docker-machine", "openstack-ssh-user=ubuntu",]
OffPeakTimezone = ""
OffPeakIdleCount = 0
OffPeakIdleTime = 0
So. Will this ever be resolved? Asking for a friend.
I doubt it, because docker-machine is not maintained anymore. We are also just applying our patch to gitlab's docker-machine fork (https://gitlab.com/gitlab-org/ci-cd/docker-machine).
Interesting. Do you know if this issue still persists in Gitlab's fork?
Of course it does, that's why we still apply our patch (https://gitlab.com/syseleven/docker-machine/-/tree/concurrent-fip-deletion).