dstack icon indicating copy to clipboard operation
dstack copied to clipboard

Wait for cloud-init on dstack-gateway before attempting any operations

Open jvstme opened this issue 1 year ago • 9 comments

Current

After connecting to dstack-gateway via SSH, dstack-server will attempt updating the gateway with update.sh or configuring it by calling the /api/config endpoint. However, dstack-gateway's installation and setup with cloud-init may be unfinished by that moment yet. This would lead to unclear dstack-server errors like

Failed to configure gateway 35.202.8.178: ReadError(‘’)

or

Failed to update gateway 35.202.8.178: /bin/sh: 0: cannot open dstack/update.sh: No such file

Proposed

  • After establishing each SSH connection to dstack-gateway ensure that cloud-init has finished by running
    cloud-init status --wait
    
  • Check the output of cloud-init status and report an error to the user if cloud-init was not successful
  • Add a timeout for waiting for cloud-init status and report an error to the user if the timeout is reached
  • Remove the retry logic when configuring dstack-gateway or reduce the number of attempts

This should improve the user experience, facilitate troubleshooting, prevent bugs.

jvstme avatar May 14 '24 10:05 jvstme

After #1236 we give gateway more than enough time to install and setup. If it takes more time for some reason, then we should fix the underlying problem. This issue only addresses the error messages, so I'd state it as minor.

r4victor avatar May 17 '24 09:05 r4victor

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Jun 17 '24 01:06 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

peterschmidt85 avatar Jul 01 '24 01:07 peterschmidt85

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Aug 01 '24 01:08 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

github-actions[bot] avatar Aug 16 '24 01:08 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Sep 16 '24 01:09 github-actions[bot]

@jvstme is this issue still valid?

peterschmidt85 avatar Sep 27 '24 08:09 peterschmidt85

@peterschmidt85, yes

jvstme avatar Sep 27 '24 11:09 jvstme

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Oct 28 '24 02:10 github-actions[bot]