dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Bug]: shim restart loop if Docker is not available (was: shim panics when adding SSH fleets without Docker)

Open r4victor opened this issue 1 year ago • 5 comments

Steps to reproduce

  1. Create an ssh fleet on instances without Docker installed.
  2. Fleet hangs in status pending.

Actual behaviour

The shim fails with the panic:

panic: cannot get Docker info

goroutine 1 [running]:
main.getDiskSize()
	/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:280 +0x23d
main.writeHostInfo()
	/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:218 +0xdd
main.main.func1(0xc0000ca2c0?)
	/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:154 +0x533
github.com/urfave/cli/v2.(*Command).Run(0xc0000ca2c0, 0xc0000a3000, {0xc0001375e0, 0x1, 0x1})
	/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:279 +0x9dd
github.com/urfave/cli/v2.(*Command).Run(0xc0000ca580, 0xc0000a2d00, {0xc0000900f0, 0x3, 0x3})
	/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:272 +0xc2e
github.com/urfave/cli/v2.(*App).RunContext(0xc0000d6a00, {0xa3eda8?, 0xdaa120}, {0xc0000900f0, 0x3, 0x3})
	/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:337 +0x5db
github.com/urfave/cli/v2.(*App).Run(...)
	/home/runner/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:311
main.main()
	/home/runner/work/dstack/dstack/runner/cmd/shim/main.go:167 +0xbce

I initially though it was related to the AMI I was using (Ubuntu 20.04 with cgroupsv1) but it happened to be caused by a failed Docker install which I missed.

Also, this part of the shim code panics everywhere and drops errors without logging...

Expected behaviour

The shim properly handles Docker missing. Ideally, the error is propagated to the server.

dstack version

master

Server logs

No response

Additional information

No response

r4victor avatar Oct 15 '24 08:10 r4victor