cli icon indicating copy to clipboard operation
cli copied to clipboard

CLI hangs on interactive run when container fails to attach to network

Open laurazard opened this issue 1 year ago • 2 comments

Description

Reproduce

$ docker swarm init
$ docker network create \
  --driver overlay \
  --ipv6 \
  --opt encrypted \
  --subnet "10.0.100.0/24" \
  --subnet "fd14:8656:a32e:100::/64" \
  --attachable \
  test
$ docker run --rm -it --network test alpine

Expected behavior

The CLI should handle SIGINTs and not hang.

docker version

Happens on recent builds:

Client:
 Version:           27.0.1-45-g4029dbc129
 API version:       1.46
 Go version:        go1.22.1
 Git commit:        4029dbc129
 Built:             Fri Jul  5 10:31:39 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

But also already happens as far back 25.0 (maybe more).

Additional Info

From Slack (@akerouanton):

[...] adding this subnet makes the docker run command send a ContainerStart request to the daemon, then the daemon makes a gRPC call to swarmkit and things hangs there for 30s. Until the CLI's stdin/out/err are attached to the container, maybe the right behavior should be to cancel the request? [...]

laurazard avatar Jul 05 '24 10:07 laurazard

Hi @laurazard I'm interested to contribute on this issue. I was able to reproduce this issue and got following output of command. docker: Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded Could you guide me here ? Thanks

nitintecg avatar Sep 29 '24 19:09 nitintecg

Hi @nitintecg, appreciate you wanting to look into this!

I think I started digging into it some time ago, and I can't remember exactly what conclusion I came to but I have a vague idea that this might not be a straightforward fix.

In the original issue, we mentioned:

Until the CLI's stdin/out/err are attached to the container, maybe the right behavior should be to cancel the request?

But from what I remember from looking at this, the CLI is acting correctly – even if the context used for the request is cancelled, the ContainerStart call hangs here. This calls this endpoint here.

This means that likely, any fix that we could do here will be in the daemon code, either in the places I mentioned or around here.

laurazard avatar Oct 10 '24 13:10 laurazard