pyinfra icon indicating copy to clipboard operation
pyinfra copied to clipboard

server.reboot: Never detect remote system is rebooted

Open matthijskooijman opened this issue 9 months ago • 2 comments

Describe the bug

When using server.reboot, pyinfra reboots the target but then waits for it to come back until the timeout, even if the system comes up earlier.

To Reproduce

from pyinfra.operations server
  
server.reboot(name="Reboot")
pyinfra -vvv --debug 192.168.7.2 --ssh-user root reboot.py

Result:

    Traceback (most recent call last):
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/api/operations.py", line 94, in _run_host_op
    status = command.execute(state, host, connector_arguments)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/api/command.py", line 224, in execute
    return self.function(state, host, *self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matthijs/.local/pipx/venvs/pyinfra/lib/python3.11/site-packages/pyinfra/operations/server.py", line 101, in wait_and_reconnect
    raise Exception(
Exception: Server did not reboot in time (reboot_timeout=300s)

    [192.168.7.2] Unexpected error in Python callback: Exception('Server did not reboot in time (reboot_timeout=300s)',)

Expected behavior

The server should reboot, when it is rebooted pyinfra should continue (or exit with success in this example).

Analysis

I added some debug output, and it turns out that Host.connect is called repeatedly to try making a new connection, but host.connected is true, so no actual connection attempts are made: https://github.com/pyinfra-dev/pyinfra/blob/80aca6e3ea9e2c1e423505abf1f5ef9c2c4affdc/pyinfra/api/host.py#L365

Looking at the code, there is no way to make host.connected False again (except for creating a new Host object). So I wonder:

  • If connectors (SSH in particular) have something in place to detect a disconnection
  • If server.reboot should be calling host.disconnect() to explicitly terminate the connection (because the TCP connection might otherwise linger and take some time to be detected as failed).
  • Maybe host.disconnect should set connected=False?
  • server.reboot already sets host.connection to None, and then uses that to check whether the connection was succesful. Is this proper use of the host API, or is server.reboot messing with internals? Should server.reboot even check for a succesfulconnection after the fact, or should it pass raise_exceptions and then detect connection success by the absence of an exception?

(and an unrelated observation: It seems the timeout is not properly observed, since currently the timeout is divided by the interval to get the number of retries, but this assumes connection attempts take zero time, which is not true, especially when a system is rebooting, they might take longer).

Meta

pyinfra --support

    If you are having issues with pyinfra or wish to make feature requests, please
    check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
    When adding an issue, be sure to include the following:

    System: Linux
      Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.38
      Release: 6.5.0-28-generic
      Machine: x86_64
    pyinfra: v3.0b0
    Executable: /home/matthijs/.local/bin/pyinfra
    Python: 3.11.6 (CPython, GCC 13.2.0)

Installed via pipx.

matthijskooijman avatar May 24 '24 00:05 matthijskooijman