dstack
dstack copied to clipboard
[Bug]: shim restart loop if Docker is not available (was: shim panics when adding SSH fleets without Docker)
Steps to reproduce
- Create an ssh fleet on instances without Docker installed.
- Fleet hangs in status
pending.
Actual behaviour
The shim fails with the panic:
panic: cannot get Docker info
goroutine 1 [running]:
main.getDiskSize()
/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:280 +0x23d
main.writeHostInfo()
/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:218 +0xdd
main.main.func1(0xc0000ca2c0?)
/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:154 +0x533
github.com/urfave/cli/v2.(*Command).Run(0xc0000ca2c0, 0xc0000a3000, {0xc0001375e0, 0x1, 0x1})
/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:279 +0x9dd
github.com/urfave/cli/v2.(*Command).Run(0xc0000ca580, 0xc0000a2d00, {0xc0000900f0, 0x3, 0x3})
/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:272 +0xc2e
github.com/urfave/cli/v2.(*App).RunContext(0xc0000d6a00, {0xa3eda8?, 0xdaa120}, {0xc0000900f0, 0x3, 0x3})
/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:337 +0x5db
github.com/urfave/cli/v2.(*App).Run(...)
/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:311
main.main()
/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:167 +0xbce
I initially though it was related to the AMI I was using (Ubuntu 20.04 with cgroupsv1) but it happened to be caused by a failed Docker install which I missed.
Also, this part of the shim code panics everywhere and drops errors without logging...
Expected behaviour
The shim properly handles Docker missing. Ideally, the error is propagated to the server.
dstack version
master
Server logs
No response
Additional information
No response