vagrant icon indicating copy to clipboard operation
vagrant copied to clipboard

add sleep period before retrying the ssh connection

Open rgl opened this issue 6 years ago • 2 comments

while using vagrant 2.2.4 I'm trying to reboot an ubuntu vm with:

config.vm.provision :shell, path: 'reboot.sh'

with reboot.sh being:

nohup bash -c "ps -eo pid,comm | awk '/sshd/{print \$1}' | xargs kill; sync; reboot"

while the vm is rebooting the ssh communicator keeps (re)trying to connect to ssh, but that will fail because the connection is refused by the vm while its booting / not yet ready... and the communicator gives up too quickly.

while following the code, this ended up being because the retry logic of the ssh communicator at https://github.com/hashicorp/vagrant/blob/b1d8b952bb4da7e18782f6e3422cfe5e99014690/plugins/communicators/ssh/communicator.rb#L431 is not sleeping a bit between retries. this need to be changed to add the sleep argument, e.g.:

connection = retryable(tries: opts[:retries], on: SSH_RETRY_EXCEPTIONS, sleep: timeout) do

maybe that timeout should trickle down from the Vagrantfile provision line (like opts[:retries]), e.g.: with the sleepargument:

config.vm.provision :shell, path: 'reboot.sh', sleep: 120

rgl avatar Apr 04 '19 23:04 rgl

I ran into this problem as well. In my case my guests are RHEL VMs where they are pingable before the SSH daemon starts up. The behavior i see with vagrant up --debug is:

  • The guest becomes pingable
  • An SSH connection is initiated
  • SSH returns CONNECTIONREFUSED because the SSH daemon on the guest isn't up yet (taking a little while to boot)
  • Vagrant does NOT retry and simply exits with an error

I found the same section of code that @rgl did and manually set the opts[:retries] to 5 (seems to be set to 1 when the function is called) and then added in a sleep. This allowed the SSH connection to be retried and communication to the guest works great.

I'm thinking a good solution would be to expose something like:

config.ssh.retries = 5
config.ssh.retry_sleep_interval = 10

These options would allow the user to control the number SSH retries and the sleep time between retries.

Thoughts?

nmaludy avatar Jan 05 '23 19:01 nmaludy

hello all, the very same issue I had reported here is still relevant, if someone is also affected here is a workaround

cd vagrant/embedded/gems/*/gems/vagrant-*/plugins/communicators/ssh
curl -fsSL https://github.com/hashicorp/vagrant/commit/424808388956c0d6acf0e91ca751fe9345f6e7f8.patch -o ssh.patch
patch -p4 -i ssh.patch
rm -f ssh.patch

a diff is connected to https://github.com/hashicorp/vagrant/pull/12292

avoidik avatar Jan 28 '24 15:01 avoidik