dstack
dstack copied to clipboard
[Bug]: Gateway Resource is not Deleted If EC2 Instance is Terminated Manually Beforehand
Steps to reproduce
- Create a gateway resource.
- Wait for the gateway EC2 instance to finish launch sequence.
- Delete the gateway EC2 instance on AWS.
- Try deleting the gateway resource via
dstack gateway delete --yes <gw name>to clean things up. - Although the
deletecommand succeeds,dstack gateway listreveals that the gateway record is still there, and the gateway is still in therunningstate. Retrying doesn't work.
I've attached the server logs below which indicate failure to SSH into the non-existent EC2 instance, which actually makes perfect sense.
Actual behaviour
No response
Expected behaviour
Although this goes against the best practice of managing the life cycle of dstack resources exclusively through dstack CLI, the CLI "succeeding" at deleting the gateway while the gateway is still there (from the dstack's point of view) is misleading.
As a first step,
- The gateway should probably be moved to the "orphaned"/"zombie" state, AND
dstack gateway deleteon the non-existent gateway should produce an error message and exit with non-zero code.
In addition to that, the EC2 instance should have termination protection enabled by default, and the docs should clearly state that dstack resources must only be managed via dstack CLI.
dstack version
0.19.36
Server logs
{"timestamp": "2025-11-25 19:02:04,310", "logger": "dstack._internal.server.background.tasks.process_gateways", "level": "ERROR", "message": "Connection to gateway 10.10.24.215 failed: ssh: connect to host 10.10.24.215 port 22: Connection timed out\r\n"}
{"timestamp": "2025-11-25 19:02:04,306", "logger": "dstack._internal.core.services.ssh.tunnel", "level": "DEBUG", "message": "SSH tunnel failed: b'ssh: connect to host 10.10.24.215 port 22: Connection timed out\\r\\n'"}
Additional information
No response