cli
cli copied to clipboard
cli hangs on dead connection
Description
Steps to reproduce the issue:
- Create a socket file with a dead listener by running:
nc -lkU /home/nick/test.sock
- Run any docker command against the socket
DOCKER_HOST="unix:///home/nick/test.sock" docker ps
Describe the results you received:
The CLI should eventually time out and/or error
Describe the results you expected:
The CLI hangs indefinitely.
Additional information you deem important (e.g. issue happens only occasionally):
This is distilled from a longer user report and stack trace upstream: https://github.com/tilt-dev/tilt/issues/5841
Output of docker version:
docker version
Client: Docker Engine - Community
Cloud integration: v1.0.24
Version: 20.10.16
API version: 1.41
Go version: go1.17.10
Git commit: aa7e414
Built: Thu May 12 09:17:23 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Additional environment details (AWS, VirtualBox, physical, etc.): Linux
hmmmm....this is still reproducible for me, but i thought we fixed it
i see a stack trace like:
/usr/local/go/src/net/http/client.go:590
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).doRequest(0xc000258b40, 0xc000258b40?)
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/request.go:140 +0x7c fp=0xc00063f0b8 sp=0xc00063efd0 pc=0x5609e8a2a55c
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).Ping(0xc000258b40, {0x5609e9506940, 0x5609e9d51840})
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/ping.go:28 +0x145 fp=0xc00063f390 sp=0xc00063f0b8 pc=0x5609e8a253a5
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).NegotiateAPIVersion(0xc000258b40, {0x5609e9506940?, 0x5609e9d51840?})
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/client.go:310 +0x45 fp=0xc00063f468 sp=0xc00063f390 pc=0x5609e8a0a505
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).checkVersion(...)
I think the fix was in https://github.com/docker/cli/pull/3722
But of course that's CLI code, so could the client have been initialised through another path? (I think there's multiple paths to initialise things, which were added for different implementations, making things ... messy)
Wondering if the Ping itself should automatically / always add a timeout if none is set (or perhaps always?)
still digging into it, but i think there's multiple pings, and i think it's getting stuck on another one deep in the client code
Yeah, I wonder if this is due to the effort to not connect until we need a connection, and that code-path not hitting the one we added the default timeout to 🤔
(hence thinking: should we make ping always have some timeout, separate from the context as a whole)
i don't really understand why we continue trying to make additional client requests at all if the initial ping negotiation fails.
like, if the ping fails, the protocol negotiation is going to fallback to the v1.24 api, which is kind of bad even if the daemon is working.
OK, here's some good threads:
- https://github.com/docker/cli/issues/149
- https://github.com/docker/cli/pull/546
- https://github.com/moby/moby/pull/39032
to see if i understand them --
in 2017, we have this thing called docker datacenter. _ping was for establishing the health of the datacenter as a whole, NOT whether the daemon was alive. So we changed the behavior so that if the ping returned a 500, we would try to keep going and make some more requests.
in 2019, we changed the client to do its own layer of negotation based on Ping(). we largely subverted the Ping() protocol negotation behavior we added in 2017. But we never went back and fixed the CLI code, so now the CLI and client code have different protocol negotiation protocols :dizzy_face: