cli icon indicating copy to clipboard operation
cli copied to clipboard

cli hangs on dead connection

Open nicks opened this issue 3 years ago • 7 comments

Description

Steps to reproduce the issue:

  1. Create a socket file with a dead listener by running:
nc -lkU /home/nick/test.sock
  1. Run any docker command against the socket
DOCKER_HOST="unix:///home/nick/test.sock" docker ps

Describe the results you received:

The CLI should eventually time out and/or error

Describe the results you expected:

The CLI hangs indefinitely.

Additional information you deem important (e.g. issue happens only occasionally):

This is distilled from a longer user report and stack trace upstream: https://github.com/tilt-dev/tilt/issues/5841

Output of docker version:

docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.24
 Version:           20.10.16
 API version:       1.41
 Go version:        go1.17.10
 Git commit:        aa7e414
 Built:             Thu May 12 09:17:23 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Additional environment details (AWS, VirtualBox, physical, etc.): Linux

nicks avatar Jun 03 '22 15:06 nicks

hmmmm....this is still reproducible for me, but i thought we fixed it

i see a stack trace like:

	/usr/local/go/src/net/http/client.go:590
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).doRequest(0xc000258b40, 0xc000258b40?)
	/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/request.go:140 +0x7c fp=0xc00063f0b8 sp=0xc00063efd0 pc=0x5609e8a2a55c
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).Ping(0xc000258b40, {0x5609e9506940, 0x5609e9d51840})
	/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/ping.go:28 +0x145 fp=0xc00063f390 sp=0xc00063f0b8 pc=0x5609e8a253a5
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).NegotiateAPIVersion(0xc000258b40, {0x5609e9506940?, 0x5609e9d51840?})
	/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/client.go:310 +0x45 fp=0xc00063f468 sp=0xc00063f390 pc=0x5609e8a0a505
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).checkVersion(...)

nicks avatar Feb 15 '24 13:02 nicks

I think the fix was in https://github.com/docker/cli/pull/3722

But of course that's CLI code, so could the client have been initialised through another path? (I think there's multiple paths to initialise things, which were added for different implementations, making things ... messy)

thaJeztah avatar Feb 15 '24 13:02 thaJeztah

Wondering if the Ping itself should automatically / always add a timeout if none is set (or perhaps always?)

thaJeztah avatar Feb 15 '24 13:02 thaJeztah

still digging into it, but i think there's multiple pings, and i think it's getting stuck on another one deep in the client code

nicks avatar Feb 15 '24 13:02 nicks

Yeah, I wonder if this is due to the effort to not connect until we need a connection, and that code-path not hitting the one we added the default timeout to 🤔

(hence thinking: should we make ping always have some timeout, separate from the context as a whole)

thaJeztah avatar Feb 15 '24 14:02 thaJeztah

i don't really understand why we continue trying to make additional client requests at all if the initial ping negotiation fails.

like, if the ping fails, the protocol negotiation is going to fallback to the v1.24 api, which is kind of bad even if the daemon is working.

nicks avatar Feb 15 '24 14:02 nicks

OK, here's some good threads:

  • https://github.com/docker/cli/issues/149
  • https://github.com/docker/cli/pull/546
  • https://github.com/moby/moby/pull/39032

to see if i understand them --

in 2017, we have this thing called docker datacenter. _ping was for establishing the health of the datacenter as a whole, NOT whether the daemon was alive. So we changed the behavior so that if the ping returned a 500, we would try to keep going and make some more requests.

in 2019, we changed the client to do its own layer of negotation based on Ping(). we largely subverted the Ping() protocol negotation behavior we added in 2017. But we never went back and fixed the CLI code, so now the CLI and client code have different protocol negotiation protocols :dizzy_face:

nicks avatar Feb 15 '24 14:02 nicks