add sleep period before retrying the ssh connection
while using vagrant 2.2.4 I'm trying to reboot an ubuntu vm with:
config.vm.provision :shell, path: 'reboot.sh'
with reboot.sh being:
nohup bash -c "ps -eo pid,comm | awk '/sshd/{print \$1}' | xargs kill; sync; reboot"
while the vm is rebooting the ssh communicator keeps (re)trying to connect to ssh, but that will fail because the connection is refused by the vm while its booting / not yet ready... and the communicator gives up too quickly.
while following the code, this ended up being because the retry logic of the ssh communicator at https://github.com/hashicorp/vagrant/blob/b1d8b952bb4da7e18782f6e3422cfe5e99014690/plugins/communicators/ssh/communicator.rb#L431 is not sleeping a bit between retries. this need to be changed to add the sleep argument, e.g.:
connection = retryable(tries: opts[:retries], on: SSH_RETRY_EXCEPTIONS, sleep: timeout) do
maybe that timeout should trickle down from the Vagrantfile provision line (like opts[:retries]), e.g.: with the sleepargument:
config.vm.provision :shell, path: 'reboot.sh', sleep: 120
I ran into this problem as well. In my case my guests are RHEL VMs where they are pingable before the SSH daemon starts up. The behavior i see with vagrant up --debug is:
- The guest becomes pingable
- An SSH connection is initiated
- SSH returns CONNECTIONREFUSED because the SSH daemon on the guest isn't up yet (taking a little while to boot)
- Vagrant does NOT retry and simply exits with an error
I found the same section of code that @rgl did and manually set the opts[:retries] to 5 (seems to be set to 1 when the function is called) and then added in a sleep. This allowed the SSH connection to be retried and communication to the guest works great.
I'm thinking a good solution would be to expose something like:
config.ssh.retries = 5
config.ssh.retry_sleep_interval = 10
These options would allow the user to control the number SSH retries and the sleep time between retries.
Thoughts?
hello all, the very same issue I had reported here is still relevant, if someone is also affected here is a workaround
cd vagrant/embedded/gems/*/gems/vagrant-*/plugins/communicators/ssh
curl -fsSL https://github.com/hashicorp/vagrant/commit/424808388956c0d6acf0e91ca751fe9345f6e7f8.patch -o ssh.patch
patch -p4 -i ssh.patch
rm -f ssh.patch
a diff is connected to https://github.com/hashicorp/vagrant/pull/12292