[BUG] docker-compose does not close unix socket with dockerd after container exits and restarts
Description
While troubleshooting an issue that resulted in a dockerd crash, I have found that docker-compose does not close a unix socket with the docker daemon when a container exits and restarts.
This means that exponentially as containers restart more sockets are left open with dockerd, which spins up a new thread for each time a new one is opened, eventually leading dockerd to hit max-threads limit in the Kernel and crash.
In our case, the production workload has about 10 containers that are in an on-demand deployment style, so not all 10 need to be up and running depending on the pool of "devices" needing data processed upstream of them. Ones that aren't used will restart and re-query the database for a new endpoint to process input from.
While this approach of restarting containers is probably not the greatest, at the very least the unix sockets should be closed when finished.
From looking into this on the dockerd side, it appears that dockerd sends a notifyClosed via the socket when it exits, does docker compose handle that?
Steps To Reproduce
- create
docker-compose.yaml
---
services:
occupied:
image: container-used:latest
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "30m"
surplus:
restart: unless-stopped
image: container-surplus:latest
logging:
driver: "json-file"
options:
max-file: "5"
max-size: "30m"
- create Dockerfile
FROM alpine:latest AS occupied
CMD ["sleep", "3600"]
FROM alpine:latest AS surplus
CMD ["sleep", "10"]
- build images
docker build --target occupied -t container-used:latest .
docker build --target surplus -t container-surplus:latest .
- start project
docker compose -f docker-compose.yaml --scale occupied=8 --scale surplus=2 up
- concurrently with 4, observe total number of tasks/threads under dockerd:
watch -n1 "ls /proc/`pgrep dockerd`/task | wc -l"
- concurrently with 4, observe unix socket fds under the
docker-composepid:
watch -n1 "lsof -p `pgrep docker-compose` | grep unix"
You will observe the following:
- Docker compose will restart the exiting containers as they gracefully close.
- In step 5, you will see the total number of threads under dockerd will begin to rise.
- in step 6, you will see more unix sockets being added to the fd list for docker-compose process.
Compose Version
Docker Compose version v2.29.1
Docker Environment
Client: Docker Engine - Community
Version: 27.1.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.16.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.29.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 11
Running: 0
Paused: 0
Stopped: 11
Images: 7
Server Version: 27.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 2bf793ef6dc9a18e00cb12efb64355c2c9d5eb41
runc version: v1.1.13-0-g58aa920
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.4.0-190-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.74GiB
Name: moby48236
ID: f5179277-2595-4df6-9ec6-44fc55c19f74
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Anything else?
Initially thought this was a regression of an existing bug in dockerd that was fixed in 27.0.1, and opened an issue there: https://github.com/moby/moby/issues/48236
However with observation and feedback from the devs, I discovered this to be an issue with the compose plugin. So here I am :)
cc @ndeloof @glours (relates to https://github.com/moby/moby/issues/48236 for more details)
Not sure I understand how compose is involve here. Compose creates containers with restartPolicy as declared in compose file, but this is engine restarting them. Compose is stateless and client side, so once container have been created it has no impact on what's going on there.
From comments in the linked moby issue, it appears docker-compose only closes file descriptors upon its own exit, but until such a time anything open with the daemon remains open, especially if compose is being run in foreground and showing the logs of the running containers.
From what I can see on the daemon side and my garnered understanding, dockerd reaches a notifyClosed routine before continuing to lstat() the socket, so I believe the daemon is awaiting docker-compose to release that fd before it itself releases on its end. So both ends are just kinda stuck waiting on one another to do something if that makes sense.
docker-compose access the engine through the HTTP API. Maybe it does not manage correctly the many concurrent connexions, but this should not have impacts on file descriptors used by engine.
Please correct me if I'm wrong, but don't those HTTP API connections pass through UNIX Sockets? Those are at least being held open after the engine stops the container they were responsible for.
Still seems to be an issue... I've confirmed that using the --scale option in docker compose seems to trigger or at least accelerate the issue, even if the containers have init: true
we can even make it crash faster! --scale surplus=9001
If you run thousands replicas, compose will open same number of ContainerAttach long-running API calls, which may indeed break some limits.
Still, I hardly understand how Compose would be responsible for this issue, as a docker API client (just using same SDK as docker CLI)
Yeah, after playing around with this more in an isolated environment, I don't see how docker-compose could be involved, if as you say, it's just using the upstream docker client API. I'll take this on back to Moby.
Though, I find logic for attaching containers https://github.com/docker/compose/blob/main/pkg/compose/attach.go here, but is there logic to detach/close that connection after the container is closed?
goroutine 218 [chan receive]:
runtime.gopark(0x6e3a22746e696f70?, 0x426e4f222c6c6c75?, 0x75?, 0x69?, 0x6562614c222c6c6c?)
runtime/proc.go:398 +0xce fp=0xc000a18f00 sp=0xc000a18ee0 pc=0x44002e
runtime.chanrecv(0xc000540300, 0x0, 0x1)
runtime/chan.go:583 +0x3cd fp=0xc000a18f78 sp=0xc000a18f00 pc=0x40baed
runtime.chanrecv1(0x6e2d72656e696174?, 0x223a227265626d75?)
runtime/chan.go:442 +0x12 fp=0xc000a18fa0 sp=0xc000a18f78 pc=0x40b6f2
github.com/docker/compose/v2/pkg/compose.(*composeService).attachContainerStreams.func3()
github.com/docker/compose/v2/pkg/compose/attach.go:127 +0x49 fp=0xc000a18fe0 sp=0xc000a18fa0 pc=0x1cb6969
runtime.goexit()
runtime/asm_amd64.s:1650 +0x1 fp=0xc000a18fe8 sp=0xc000a18fe0 pc=0x4704e1
created by github.com/docker/compose/v2/pkg/compose.(*composeService).attachContainerStreams in goroutine 81
github.com/docker/compose/v2/pkg/compose/attach.go:126 +0x205
I did a SIGKILL dump and saw many open sockets with dockerd still from compose using the reproduction steps above, and attach.go seems to be the most of them:
grep -c "github.com/docker/compose/v2/pkg/compose/attach.go:126" compose-trace-2.log
92
Is there anything else I can run or experiment with to help give more information into this issue?
There's no Detach API. ContainerAttach is a long running HTTP-hijack call which updates protocol so stdin/stdout streams can be attached from client to container. As container stops, streams will end with EOF
This is managed here: https://github.com/docker/compose/blob/e6ef8629a8e3d4dd7e0565c2237bf528149ee1e9/pkg/compose/attach.go#L146-L152 using code from docker/cli, which I would consider to be safe for the need :)
Thanks for clarifying that! I'll look closer at the behaviour of the containers to see what they're doing when they exit. Furthermore, providing information into the Moby project to track this issue.