Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/<uuid>/shim.sock": listen unix /containerd-shim/moby/<uuid>/shim.sock: bind: address already in use: unknown
- [x] This is a bug report
- [ ] This is a feature request
- [x] I searched existing issues before opening this one
I run containers using the "restart always" policy, but in some situations (the trigger is unclear to me at this point), a subset of containers fail to be restarted by the docker daemon.
In this example, I have a bunch of services that all have (almost) identical configs, and a random subset of service container are suddenly down (after days of running fine). :
root@analyst:~# docker ps | grep worker.000.802
root@analyst:~# docker ps -a | grep worker.000.802
6d504138f7f7 <my-image> "/entrypoint.sh" 4 weeks ago Exited (255) 8 days ago worker-000-802_1
other container instances of the service are running fine (and are restarted every once in a while):
root@analyst:~# docker ps | grep worker.000.801
832f53c0f4ce <my-image> "/entrypoint.sh" 4 weeks ago Up 28 minutes worker-000-801_1
When I try (for testing) to manually restart the container that the daemon failed to restart automatically, this fails:
root@analyst:~# docker start 6d504138f7f7
Error response from daemon: failed to listen to abstract unix socket "/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock": listen unix /containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock: bind: address already in use: unknown
Error: failed to start containers: 6d504138f7f7
Investigating the problem, I found that the unix socket mentioned above does not exist on the file-system, but the error message says "already in use", so I searched via lsof:
root@analyst:~# lsof -U | grep 6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f
docker-co 37032 root 3u unix 0xffff88030db67800 0t0 502614215 @/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock
docker-co 37032 root 6u unix 0xffff880dd67da1c0 0t0 323429479 @/containerd-shim/moby/6d504138f7f7ddcd57437006a3a6e70ec4c8ed32c08b5969d788f24eef28f51f/shim.sock
so, indeed the socket is in use, but not on the file-system... which makes me wonder if the process (PID 37032) actually removed it, but didn't properly close it (yet?) while shutting down?
stracing the process shows that it's currently waiting on a mutex:
root@analyst:~# strace -p 37032
Process 37032 attached
futex(0x7fd008, FUTEX_WAIT, 0, NULL
with no other behavior.
To test further, I decided to kill the process that's supposed to provide the unix socket, and now I can start the container successfully:
root@analyst:~# kill 37032
root@analyst:~# docker start 6d504138f7f7
6d504138f7f7
root@analyst:~# docker ps -a | grep worker.000.802
6d504138f7f7 <my-image> "/entrypoint.sh" 4 weeks ago Up 3 seconds worker-000-802_1
Expected behavior
Docker restart policy "always" always restarts a container.
Actual behavior
Docker restart policy "always" randomly fails after a service has been running longer periods of time (maybe because containerd does not correctly terminate/release the unix-socket)
Steps to reproduce the behavior
I have not been able to trigger the problem in a reproducible way, but I have seen dozens of instances over weeks of running services. Interestingly it happens on different services that are using completely unrelated images (aside that they have a common Debian-based base-image)
Output of docker version:
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: 5f88b8b
Built: Fri Sep 28 15:50:02 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: 5f88b8b
Built: Fri Sep 28 15:49:28 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 51
Running: 50
Paused: 0
Stopped: 1
Images: 7
Server Version: 18.06.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 134
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 3.13.0-157-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 62.87GiB
Name: analyst
ID: AKNM:4XYS:MIJI:G2E6:5DRO:MP2I:Q2MY:CXPE:WDJW:MI4D:WS32:O3ON
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Physical host , under constant and high load. The containers that show the problem have memory limits in place using docker-compose:
version: "2.4"
services:
worker-000-801:
image: "<my-image>"
network_mode: "host"
...
mem_limit: 4294967296
Note that I'm using a private container registry, which is why I decided to replace the image data with my-image.
The only potentially-related bug I managed to find online is this:
https://github.com/moby/moby/issues/38726
I've got the same issue in my project...
Same problem here with Docker updates that don't re-start the containers.
Running docker rm for the affected containers and re-creating them works, but is not very ideal.
Edit: Gave myself a +1 a few months later because I had the same issue and found my own answer as a solution…
Same problem here with Docker updates that don't re-start the containers.
Running
docker rmfor the affected containers and re-creating them works but is not very ideal.
I've got the same problem here and I can't remove it, unfortunately.
Any update one this? I too am hitting it
Any update one this? I too am hitting it
The solution that worked for me was to destroy the container and create a new one with the same volume from the old one.
Try find the docker process and kill it will resolve the issue:
for pid in $(ps -ef|grep docker|awk '{print $2}'); do
lsof -p $pid|grep $uuid
done
kill -9 <pid>
I also have this problem whenever the docker package in Ubuntu gets updated. Not sure whether this is a problem with the packaging or with docker itself.
@chenz-svsarrazin Thanks! An apt update + upgrade worked and I didn't need to recreate the container. (Ubuntu 18.04.2 LTS + Docker version 18.09.7, build 2d0083d).
Reproduced on Ubuntu 18.10, not sure what caused it but all my servers/containers randomly went down. Could have been an update.
There was the 18.09.7 update a few days ago (security update) which restarted the Docker service and, for me, brought down four web servers and corrupted one database. Regular start-up didn't work due to these errors.
kill -9 $(netstat -lnp |grep containerd-sh |awk '{print $9}'|cut -d / -f 1)
I tryed kill pid and downgrade docker but it don't help me My solution:
- upgrade docker from 18.0.9.2 to 18.09.7 (build 2d0083d)
- upgrade docker-compose to 1.24.0
- and i start container but recieve same error:
# docker start freepbx
...
bind: address already in use: unknown
Error: failed to start containers: freepbx
- then i reboot After reboot at me the error is gone and container start
Solution which worked for me.
- Reboot the node/server
- Restart the docker service
- Start the docker container
We just had the same issue after updating the docker package to version docker-ce-19.03.3-3.el7.x86_64. On CentOS Linux release 7.7.1908 (Core).
Exactly the same as in the first post, however killing the docker pid did not work for us. A docker restart did not work for us. A reboot of the entire server solved the problem.
Any more on this issue? It is really scary that this can happen to our production services.
same issue, couldn't run after update
Error: Cannot start service odfenode: failed to listen to abstract unix socket
Version
Client: Docker Engine - Community
Version: 19.03.3
API version: 1.40
Go version: go1.12.10
Git commit: a872fc2
Built: Tue Oct 8 00:59:54 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.3
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: a872fc2
Built: Tue Oct 8 00:58:28 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
Same issue here.
In my case, it helped to downgrade to an earlier docker version and then restart the system (just restarting docker did not help). No need to redeploy/remove existing containers.
Example for Ubuntu Xenial:
sudo apt install docker-ce=5:19.03.1~3-0~ubuntu-xenial
I am seeing the same issue after updating to Docker version 19.03.4. I cannot reboot this Debian machine without a lot of hassle. I wish I hadn't upgraded Docker.
Captain Hindsight advise: Pin the docker version.
You wouldn't expect this from the non-edge channel.
Try find the docker process and kill it will resolve the issue:
for pid in $(ps -ef|grep docker|awk '{print $2}'); do lsof -p $pid|grep $uuid done kill -9 <pid>
this one saved my day! thanks
@thaJeztah Is this line same as other issues where it was some packaging related problem?
This SO post seems to indicate this may be an issue with the Ubuntu Snap package and that the following may resolve it:
# Remove snap installation, any prior Docker installations
sudo snap remove docker
sudo apt-get remove docker docker-engine docker.io
# Install latest Docker.io version
sudo apt-get update
sudo apt install docker.io
# Run Docker on startup
sudo systemctl start docker
sudo systemctl enable docker
Based on the comments it seems this happens on 19.03.3 and 19.03.4, and we had someone reproduce it on Xenial with 19.03.8 as well, but I was NOT able to reproduce it with the following:
Add the apt repository if you don't already have it:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
Install Docker CE 19.03.8 (latest) explicitly:
sudo apt install docker-ce=19.03.8~3-0~ubuntu-xenial
does someone know what is the actual reason for this issue?
Still get similar issue, but with version 27.1.1. In this case, the first time nginx start (using docker compose 2.29.1) is ok, but after down then up again it got error. with same configuration on the host, it is ok. with traditional unix socket (visible on filesystem), it is ok.