for-linux
for-linux copied to clipboard
Docker inspect hangs forever
- [x] This is a bug report
- [ ] This is a feature request
- [x] I searched existing issues before opening this one
Expected behavior
Docker inspect returns information, allowing docker-compose to work
Actual behavior
Docker inspect hangs forever and causes an HTTP timeout
Steps to reproduce the behavior
I've been getting this off and on since January of this year. But not on Mac, on Ubuntu. First on 16.04, then on 17.04, 17.10 and now on 18.04 - and my verbose output hangs at the exact same spot as the reports on the docker for mac issue, during container inspection. This is limited to Desktop linux, and some indicators seem to imply this is from the Mac going to sleep, which would line up with my experience. When Ubuntu goes to sleep and comes back, the issue is created. Rebooting the OS seems to be the only workaround, as restarting the docker daemon does not resolve the issue.
Output of docker version:
Client:
Version: 18.06.0-ce
API version: 1.38
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:09:54 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.0-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: 0ffa825
Built: Wed Jul 18 19:07:56 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 218
Running: 0
Paused: 0
Stopped: 218
Images: 2
Server Version: 18.06.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-30-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.53GiB
Name: crouth-Latitude-5480
ID: PESF:MXJF:NVGR:Q25S:65JB:V4QJ:WIVC:ALQF:4YPF:MXYC:27YK:5YUU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.)
Tested and reproduced on Ubuntu Desktop 16.04, 17.04, 17.10 and 18.04
This is the exact same issue as seen on Mac desktops: https://github.com/docker/for-mac/issues/2378
docker-compose --verbose run playbook -i inventory/global/ playbooks/dev-jasper-reports.yml
compose.config.config.find: Using configuration files: ./docker-compose.yml
docker.utils.config.find_config_file: Trying paths: ['/home/crouth/.docker/config.json', '/home/crouth/.dockercfg']
docker.utils.config.find_config_file: Found file at path: /home/crouth/.docker/config.json
docker.auth.load_config: Found 'auths' section
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/version HTTP/1.1" 200 543
compose.cli.command.get_client: docker-compose version 1.21.2, build a133471
docker-py version: 3.3.0
CPython version: 3.6.5
OpenSSL version: OpenSSL 1.0.1t 3 May 2016
compose.cli.command.get_client: Docker base_url: http+docker://localhost
compose.cli.command.get_client: Docker version: Platform={'Name': ''}, Components=[{'Name': 'Engine', 'Version': '18.06.0-ce', 'Details': {'ApiVersion': '1.38', 'Arch': 'amd64', 'BuildTime': '2018-07-18T19:07:56.000000000+00:00', 'Experimental': 'false', 'GitCommit': '0ffa825', 'GoVersion': 'go1.10.3', 'KernelVersion': '4.15.0-30-generic', 'MinAPIVersion': '1.12', 'Os': 'linux'}}], Version=18.06.0-ce, ApiVersion=1.38, MinAPIVersion=1.12, GitCommit=0ffa825, GoVersion=go1.10.3, Os=linux, Arch=amd64, KernelVersion=4.15.0-30-generic, BuildTime=2018-07-18T19:07:56.000000000+00:00
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreportsservers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreportsservers_default HTTP/1.1" 404 61
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreports-servers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreports-servers_default HTTP/1.1" 200 862
compose.cli.verbose_proxy.proxy_callable: docker inspect_network -> {'Attachable': True,
'ConfigFrom': {'Network': ''},
'ConfigOnly': False,
'Containers': {'35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34': {'EndpointID': '46aaa4306520341b9d1f3f238ceebca2d1aa9f9b00bf9c2b221fa6313439ebc7',
'IPv4Address': '172.18.0.2/16',
'IPv6Address': '',
'MacAddress': '02:42:ac:12:00:02',
'Name': 'jasperreports-servers_playbook_run_112'}},
'Created': '2018-06-27T13:27:42.910819198-07:00',
'Driver': 'bridge',
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('dockerhub.alea.ca/docker/ansible')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/images/dockerhub.alea.ca/docker/ansible/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_image -> {'Architecture': 'amd64',
'Author': '',
'Comment': '',
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': None,
'Domainname': '',
'Entrypoint': ['ansible-playbook'],
...
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=jasperreports-servers', 'com.docker.compose.service=playbook', 'com.docker.compose.oneoff=True']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Djasperreports-servers%22%2C+%22com.docker.compose.service%3Dplaybook%22%2C+%22com.docker.compose.oneoff%3DTrue%22%5D%7D HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 112 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34')
ERROR: compose.cli.errors.log_timeout_error: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Can you grab a stack trace from the daemon?
curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
I have the same problem, here is the stack trace https://pastebin.com/raw/pBNxVRaR
it's getting quite painful and it seems to happen more and more often (hence why I'm starting to look into this)
this is also for a container created by docker-compose, in my case the container did not get started:
docker ps -a -f name=phpunit CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ced5781314bc v1_phpunit "sh" About an hour ago Created v1_phpunit_run_1
and since docker-compose tries to reuse the existing container this service is locked until docker is restarted (which often requires a computer reboot)
@mathroc Thanks!
Looks like this is stuck waiting on containerd for some reason. Can you pull the stack trace from containerd as well?
Should be able to get it with (assuming you are you using docker to manage containerd):
docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
@cpuguy83 I have one right now but I don't have the docker-containerd-ctr binary, is there an alternative to that ?
(I installed docker-ce and containerd via portage on gentoo)
Similar problem here, but using swarm and not compose.
I have a swarm service that is not listed by docker service ls, but the container is listed by docker ps. Trying to inspect it hangs indefinitely.
4144e40b25f5 docker.example.com:5000/admin:2219 "/bin/bash /admin-do…" 6 days ago Up 6 days (unhealthy) admin-service.osn1h8ctn3xeoacs9k42dxzi7.jmdgzie4mnvy475sxytmj5fse
- Red Hat Enterprise Linux Server release 7.5 (Maipo)
- Linux [hostname] 3.10.0-862.3.3.el7.x86_64 #1 SMP Wed Jun 13 05:44:23 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
- Docker version 18.06.0-ce, build 0ffa825
$ docker info
Containers: 11
Running: 11
Paused: 0
Stopped: 0
Images: 13
Server Version: 18.06.0-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: osn1h8ctn3xeoacs9k42dxzi7
Is Manager: true
ClusterID: nzrxgungmmv3qr2ftg24htxg4
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.64.225.104
Manager Addresses:
10.64.225.104:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.3.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.65GiB
Name: [hostname]
ID: WPK6:BMJC:ADNC:NUNT:VBYK:2VFV:BGLR:F4QV:5RFR:BNJ6:5KLY:N2A7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
$ curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
https://gist.githubusercontent.com/halcyon22/0470948eccb9dd285098e41d1ced4e1a/raw/315dc7863f20ac04ccdce26dc41a29f93cf5a392/docker_pprof.out
$ docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
https://gist.githubusercontent.com/halcyon22/b265cc89ed3fdabb11b1951169c728df/raw/90c00bc52124b7128963270f07dafd8b00a40c22/containerd_pprof.out
Hello, we have the same problem here, this is a year old version of docker but it might give some extra clues.
One interesting thing in my setup, dockerd has been up for 5 weeks and the containers up for 3-5 week. We have 8 running containers, 3 of which are currently hanging on docker inspect, the others are fine. the 3 containers are running python behind wsgi and some of the fine containers have the same setup.
Here's out info: Docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
docker info
Containers: 9
Running: 9
Paused: 0
Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.906GiB
Name: node02
ID: MLF2:BNI4:PH4C:U5K5:F3JD:SY26:QQ75:ULCL:LIVN:4WA5:ZU3U:CVZI
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
docker ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/docker_pprof.out
containerd ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/containerd_pprof.out
Since the container hangs when inspecting but are still running fine, i might let them like this for a while so if you have any extra information i could provide to help find a solution please tell me!
in case it helps here is curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 > docker_pprof.out : https://pastebin.com/raw/sMN1fxZe
@halcyon22
This looks like it's stuck waiting on a kill from the containerd shim (the process that manages your container process).
goroutine 2339 [select, 9557 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0xc420550200, 0xc420287e40, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 +0x24c
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0x5577ba0aca84, 0x25, 0x5577ba07fb17, 0x4, 0x5577ba71b5a0, 0xc420287e00, 0x5577ba75e460, ...)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 +0x15d
github.com/containerd/containerd/linux/shim/v1.(*shimClient).Kill(0xc4201b4108, 0x5577ba8012a0, 0xc4206ebe00, 0xc420287e00, 0x5577ba6c5b20, 0x0, 0x5577ba8012a0)
/go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1801 +0xbf
github.com/containerd/containerd/linux.(*Task).Kill(0xc42046e7e0, 0x5577ba8012a0, 0xc4206ebe00, 0x550000000009, 0xc4206ebe00, 0x5577ba80e5a0)
/go/src/github.com/containerd/containerd/linux/task.go:183 +0x97
github.com/containerd/containerd/services/tasks.(*local).Kill(0xc42020d500, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0x0, 0x0, 0x0, 0x5577ba80e640, 0xc4206bc958, 0x3)
/go/src/github.com/containerd/containerd/services/tasks/local.go:349 +0x137
github.com/containerd/containerd/services/tasks.(*service).Kill(0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0xc420393ce0, 0x5577ba0afec8, 0x4)
/go/src/github.com/containerd/containerd/services/tasks/service.go:100 +0x68
github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler.func1(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc4201b3a90, 0x68, 0x5577ba78cc00, 0xc4206bca00)
/go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:714 +0x88
github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc420287ac0, 0xc420287b00, 0xc4206bca70, 0x5577b947789a, 0x70, 0x5577ba78cc00)
/go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xa6
github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler(0x5577ba797d20, 0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4200c3960, 0x5577ba7df250, 0x0, 0x0, 0xc42023350c, 0x0)
/go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:716 +0x169
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0xc420440c30, 0x5577baea89b0, 0x0, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 +0x84a
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0x0)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 +0x131a
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc42003c070, 0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0)
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637 +0xa1
created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:635 +0xa3
ping @crosbymichael @dmcgowan on this one.
@mathroc Any chance you can get the containerd output as well? I'm not sure how docker is packaged on Gentoo... maybe just a ctr binary is available, in which case sockets and such may be in different locations.
@lindycoder There are some known deadlocks in containerd and docker in 17.12, can you upgrade to a newer version? 18.06 is the most recent release.
@cpuguy83 you're right there's a ctr bianry, here is the output : https://pastebin.com/raw/FpGwmfq7
Thanks @cpuguy83 - with that information I stopped the Docker service, killed containerd, brought the service back up again, and the problem didn't reappear. I thought I had rebooted the server prior to adding my comment, but maybe that hadn't worked.
I'm seeing the same issue here. Right now I'm trying to update all systems to the most resent version and check if the issue is resolved ...
Same problem on up to date docker version. As a possible solution you can locate process related to problem container by it's id and kill them
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:24:58 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:24 2018
OS/Arch: linux/amd64
Experimental: false
Containers: 10
Running: 7
Paused: 0
Stopped: 3
Images: 82
Server Version: 18.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
Kernel Version: 4.4.0-141-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.78GiB
Name: *
ID: I6UV:RJTS:L2XD:3QRG:OW7C:HVBG:34GL:IBFP:LXE7:XX4Q:3W6A:JZAT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
We experienced the exact same issue with certain containers of ours. Same issue with various docker versions from version 18.03 to 18.09.2 and various kernels from CentOS 7.2 to 7.6. We ended up finding out that the issue for us only occurred with containers that utilized NFS type docker volumes, and our NFS daemon somehow got into a funky state where mounts would hang indefinitely. Restarting the NFS daemon fixed our issue and mounts succeeded again. Probably docker needs to have a timeout on NFS mount attempts and allow users to do things like delete/inspect containers and remove docker volumes for misbehaving NFS mounts.
Would be curious to know if other people on this thread experiencing this issue are using NFS volumes as well, or if there are separate root causes for this issue.
We got this problem on 18.06.1-ce, no NFS used, using docker-compose
We're also having this problem, 18.06.3-ce, using Nomad 0.8.7 on Ubuntu 14.04. Two instances exhibited this problem within minutes of one another a week ago, and another today.
dockerd-pprof.log docker-containerd-ctr.log
docker ps shows
root@nomad-client-blue-10-181-2-40:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
59ef6eb37a06 8df826da855b "/usr/bin/init.sh bu…" 29 minutes ago Up 29 minutes 10.181.2.40:22386->9292/tcp, 10.181.2.40:22386->9292/udp frontend-9fd92404-4aaf-0dd1-26e9-84c74e5e3ddd
b97162768a9f 1b6c25ab9e92 "/usr/bin/init.sh bu…" 11 hours ago Up 11 hours 10.181.2.40:25345->9292/tcp, 10.181.2.40:25345->9292/udp frontend-b094a318-4ce3-4f37-bb63-72be91cedb0f
and the process tree shows:
1055 ? Ssl 116:04 /usr/bin/dockerd --raw-logs
1978 ? Ssl 196:19 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
12136 ? Sl 0:01 | \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/b97162768a9f54d0a289cd50
11069 ? Sl 0:00 | \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/59ef6eb37a0661c4386f66a0
11087 ? Ssl 0:00 | \_ puma 3.12.0 (tcp://0.0.0.0:9292) [app]
11153 ? Sl 3:41 | \_ …
12117 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
12130 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
11047 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
11061 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
Note that docker-containerd-shim for b97162768a9f has no child processes. I can docker inspect 59ef6eb37a06, but docker inspect b97162768a9f hangs.
A simple kill 12136 (of the docker-containerd-shim process) doesn't work, I have to kill -9. Even afterwards, docker inspect b97162768a9f still hangs. Only after I stop and start the docker service am I able to inspect both containers.
The output from docker-containerd-ctr shows a couple of goroutines that have been alive from the time the "stuck" container was asked to stop:
goroutine 36642793 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4205569c0)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147
goroutine 36642794 [IO wait, 26 minutes]:
internal/poll.runtime_pollWait(0x7fb457e2c950, 0x72, 0xc42064cb80)
/usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4200eab18, 0x72, 0xffffffffffffff00, 0x558dcff4ef20, 0x558dd0615a98)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc4200eab18, 0xc420df3000, 0x1000, 0x1000)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
net.(*netFD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0xc42064cc68, 0x558dcec21010, 0xc420502a80)
/usr/local/go/src/net/fd_unix.go:202 +0x51
net.(*conn).Read(0xc4209141b0, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:176 +0x6c
bufio.(*Reader).Read(0xc420556720, 0xc4209129e0, 0xa, 0xa, 0x0, 0x1, 0x0)
/usr/local/go/src/bufio/bufio.go:216 +0x23a
io.ReadAtLeast(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xa, 0xc420425a00, 0x0, 0xc420870b40)
/usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xc42064cf48, 0x2, 0x2)
/usr/local/go/src/io/io.go:327 +0x5a
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4209129e0, 0xa, 0xa, 0x558dcff4ba60, 0xc420556720, 0xc42064cf48, 0x2, 0x2, 0xc42064cfa8)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38 +0x62
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc4209129c0, 0x558dcff65b40, 0xc420040080, 0x0, 0xc420590002, 0xc4206ea1c0, 0x1b8, 0x1b8, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86 +0x6f
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run.func1(0xc420870ba0, 0xc4205569c0, 0xc4208da170, 0xc420870b40)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:144 +0x90
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:139 +0x13f
goroutine 36642792 [syscall, 26 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x2b3d, 0xc4202afde0, 0x1000004, 0x0, 0x0, 0x558dcf06504a, 0xc4200e22c0, 0x558dcff6cea0)
/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420b08510, 0x558dcfd29be0, 0xc4208da030, 0x0)
/usr/local/go/src/os/wait_waitid.go:31 +0x9a
os.(*Process).wait(0xc420b08510, 0x558dcff65bc0, 0xc420c7b350, 0x0)
/usr/local/go/src/os/exec_unix.go:22 +0x3e
os.(*Process).Wait(0xc420b08510, 0x48, 0xc4202aff70, 0x558dcebf37fb)
/usr/local/go/src/os/exec.go:123 +0x2d
os/exec.(*Cmd).Wait(0xc4205f8840, 0x558dcec25191, 0xc4202d4210)
/usr/local/go/src/os/exec/exec.go:461 +0x5e
github.com/containerd/containerd/linux/shim/client.WithStart.func1.2(0xc4205f8840, 0xc420e2f220)
/go/src/github.com/containerd/containerd/linux/shim/client/client.go:78 +0x2d
created by github.com/containerd/containerd/linux/shim/client.WithStart.func1
/go/src/github.com/containerd/containerd/linux/shim/client/client.go:77 +0x307
goroutine 34853989 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4204e7320)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147
I believe we are seeing this as well. It seems to be tied to situations where containers are getting oomkilled or otherwise are in restart loops
I see two long running calls to containerd:
goroutine 1374264 [select, 689 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc42a1f42d0, 0x10, 0xc423b152f0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc42a1f42d0, 0x555558c3fbb0, 0xc423b153b0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42a1f8340, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc42a1f6280, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc42a5c70e0, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0xc4208f4900, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6
github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x555558c3fc20, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0x0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b
github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Get(0xc4247840c0, 0x555558c74600, 0xc42003e018, 0xc42c811220, 0x0, 0x0, 0x0, 0xc426ebf7b8, 0x555556dde1ff, 0x555558c74680)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:448 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Status(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:220 +0x119
github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Delete(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x40, 0xc42c097680, 0x40)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:196 +0x119
github.com/docker/docker/libcontainerd.(*client).processEvent.func1()
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:713 +0x212
github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc42c097601, 0xc42a7b0420, 0xc421f6ba00, 0xc42a6483c0, 0xc4202f46e8, 0xc42c097640, 0x40)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3a
created by github.com/docker/docker/libcontainerd.(*queue).append
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x196
...
goroutine 1377768 [select, 688 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc426a2ec30, 0x10, 0xc42a3b2bd0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc426a2ec30, 0x555558c3fbb0, 0xc42a3b2c90)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42247f380, 0x555558b22200, 0x55555986b588, 0x0, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc420408300, 0x555558b22200, 0x55555986b588, 0x0, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc426217a40, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0xc4208f4900, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6
github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x555558c3fc20, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0x0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b
github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Exec(0xc4285e0ae8, 0x555558c74600, 0xc42003e018, 0xc428567140, 0x0, 0x0, 0x0, 0xa3, 0xc428566f60, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:475 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Exec(0xc423993e90, 0x555558c74600, 0xc42003e018, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0xc428566f60, 0x0, 0x0, 0x0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:333 +0x2ba
github.com/docker/docker/libcontainerd.(*client).Exec(0xc4202f4690, 0x555558c74600, 0xc42003e018, 0xc421cdef40, 0x40, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0x0, 0xc4234531f0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:367 +0x3f1
github.com/docker/docker/daemon.(*Daemon).ContainerExecStart(0xc4204828c0, 0x555558c74600, 0xc42003e018, 0xc4228250f1, 0x40, 0x7ffff7f0f370, 0xc42093f0c0, 0x555558c510a0, 0xc42093f280, 0x555558c510a0, ...)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 +0xb4f
github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart(0xc420d71fc0, 0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0x0)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 +0x34b
github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555556bcee3c, 0x555558aba440)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 +0x6b
github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77080)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 +0xda
github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0xc42b14ba58)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 +0x401
github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77050)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 +0x7ab
github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1(0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 +0x19a
net/http.HandlerFunc.ServeHTTP(0xc4212fed40, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
/usr/local/go/src/net/http/server.go:1947 +0x46
github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc4217185f0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 +0x228
github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0xc420edc910, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x72
net/http.serverHandler.ServeHTTP(0xc4204365b0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
/usr/local/go/src/net/http/server.go:2694 +0xbe
net/http.(*conn).serve(0xc420b67ea0, 0x555558c745c0, 0xc426d72a40)
/usr/local/go/src/net/http/server.go:1830 +0x653
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2795 +0x27d
docker info:
Containers: 206
Running: 165
Paused: 0
Stopped: 41
Images: 112
Server Version: 18.06.2-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 503.6GiB
Docker Root Dir: /mongodb/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Seem like the same issue @cpuguy83? any further guidance?
This seems closely related to https://github.com/moby/moby/issues/38064
docker 18.09-ce also have this problem
Seems like the bug is also present in 19.03.1
~# docker version
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:22:03 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:20:35 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Stack trace: https://plik.himbeere.pw/file/yw4XSErMAEMxG9wD/wUQJrxyVYWsRGzhp/stacktrace
Example:
# time timeout 10 docker rm -f dev_admin_php-fpm_1
real 0m10.004s
user 0m0.011s
sys 0m0.024s
# time timeout 10 docker inspect dev_admin_php-fpm_1
real 0m10.008s
user 0m0.013s
sys 0m0.024s
Restarting the docker daemon fixed it.
I can confirm that this issue is present on 19.03.1 on Ubuntu 18.04.2 LTS.
Does anyone have a workaround that only impacts the offending container/service? I have only been able to fix this issue by rebooting the server, as the system hangs when trying to restart the docker daemon (sudo systemctl restart docker, sudo service docker start).
This is naturally not very good for the uptime of my services :(
The machine should have plenty resources, memory is around 5-10% and system load is usually way low as well.
Happening with extremely high frequency to me on multiple ubuntu machines running 19.03.1. docker inspect is an essential part of the scripts that manage my short-lived containers, so this is a breaking issue for me. My last heavy use of docker was on 18.09.x, and I never saw this issue across tens of thousands of containers and many machines.
No NFS, not using docker compose, and the containers that inspect hangs on were not otherwise misbehaving.
edit to add: In an attempt to work around the issue, I made a shell script with permissions to read the docker/containers folder and manually pull fields that I need from the config.v2.json file. Somewhat worried about reading it while it's in the process of being rewritten or edited, but that's the only way I could think to eliminate calls to inspect. But...it turns out that I have other breaking issues with 19.03.1 and have reverted to 18.09
@luminouspanda if you can put your stack trace in a site not flagged as a security issue, such as gist, then we can review it.
Everyone else experiencing problems on 19.03, there seems to be a regression (At least impirically based on number of reports I've seen). Can you please post the stack trace from effected systems?
curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
@cpuguy83 : not sure if this will be useful, since like I said, I have other issues with 19.03 (will create a new bug report...but, in short, docker never registers that a container has exited after its referenced PID quits). But, including anyway this in the hope that it will be useful
docker version (experimental is enabled because I use checkpoints):
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:05 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:41 2019
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker info:
Client:
Debug Mode: false
Server:
Containers: 7
Running: 7
Paused: 0
Stopped: 0
Images: 10
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.0.0-25-generic
Operating System: Linux Mint 19
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.37GiB
Name: thinkpad
ID: W7ON:HPJN:2SGV:LS7L:VBOZ:RTGK:JN46:HTF7:N4KW:H4PU:24AH:E3OK
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
WARNING: No swap limit support
output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 with a hung docker container inspect:
https://gist.github.com/buck2202/8058fda098f992c4a35bc181f27bcbf3
The issue seems more correlated with the time of day than anything else, the last three days I've seen it around 3:30 pm but that must surely be a coincidence?
Anyway, here are the output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2:
https://gist.github.com/glaux/6562f669885929dcb26ef668d2fcb71c
This time I had run composer require in a php container when I got an error about memory exhaustion (I have only a few containers running on a small dev server):
Fatal error: Allowed memory size of 1610612736 bytes exhausted (tried to allocate 4096 bytes) in phar:///usr/local/bin/composer/src/Composer/DependencyResolver/Solver.php on line 223
Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.
When running docker-compose up I get the message:
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
That particular error is discussed here: https://github.com/docker/compose/issues/5620 which links to this thread.
Docker version:
$ docker version
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:05 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:41 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
This looks like a deadlock waiting on data back from containerd. I know I've seen some things come across regarding this.
Unfortunately the debug socket for containerd is no longer available from a default docker install so tracking down the cause is a bit harder :(
goroutine 562400 [select, 3 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc001a08000, 0x8, 0xc00134b880)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:267 +0xce
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:278
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc00134b880, 0x55b7e354e780, 0xc0013adf40, 0x0, 0xc00258cae0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:856 +0x70e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc00134b880, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:721 +0x48
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000e73320, 0xc00258cae8, 0xc00258cad8, 0xc001a87810, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:577 +0x7e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc000e73320, 0x55b7e354e780, 0xc0013adf40, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:720 +0x102
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x55b7e3705060, 0xc002cc0030, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0xc000176dc0, 0xc000891520, ...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:73 +0x143
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55b7e25f7f49, 0x4, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, ...)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf4
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc000176dc0, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0x0, ...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10c
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Delete(0xc000db26a8, 0x55b7e3704fe0, 0xc000052038, 0xc00269df80, 0x0, 0x0, 0x0, 0xed4e7589b, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:1315 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Delete(0xc000a12570, 0x55b7e3704fe0, 0xc000052038, 0x0, 0x0, 0x0, 0x4, 0x55b7e373b2e0, 0xc000a12570)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:312 +0x221
github.com/docker/docker/libcontainerd/remote.(*client).DeleteTask(0xc0008888a0, 0x55b7e3704fe0, 0xc000052038, 0xc000d236c0, 0x40, 0x0, 0x55b7e32203e0, 0x55b7e0ce4275, 0x196, 0xc00048200b, ...)
/go/src/github.com/docker/docker/libcontainerd/remote/client.go:438 +0xd6
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00000c5a0, 0xc001a40b80, 0x40, 0x55b7e25f7d1d, 0x4, 0xc001a40b80, 0x40, 0xc001a40bc0, 0x40, 0x3b6b, ...)
/go/src/github.com/docker/docker/daemon/monitor.go:54 +0x253
github.com/docker/docker/libcontainerd/remote.(*client).processEvent.func1()
/go/src/github.com/docker/docker/libcontainerd/remote/client.go:634 +0x109
github.com/docker/docker/libcontainerd/queue.(*Queue).Append.func1(0xc001a40b00, 0x0, 0xc001514480, 0xc0016fbc80, 0xc0008888e0, 0xc001a40b80, 0x40)
/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:28 +0x3a
created by github.com/docker/docker/libcontainerd/queue.(*Queue).Append
/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:24 +0x186
https://github.com/containerd/containerd/pull/3540 seems promising.
Still testing, but it seems like the combination of docker 19.03.3 and containerd.io 1.2.10-2 may have resolved this issue for me.
The PR @cpuguy83 referenced above (containerd/containerd#3540) was backported to containerd 1.2.8 by containerd/containerd#3561, but as far as I can tell, the download.docker.com repository for bionic wasn't updated to include this patch until 1.2.10-2 within the past week or so.
We also met the similar issue on 18.09.9 docker-ce with containerd 1.2.10 and kernel is 4.19.0-041900-generic. Not all containers but some of them will hit the hang issue.
The container process and shim process are alive as pstree shows but the docker inspect/exec/stop/kill hang when docker ps can still find the container.
docker ps can still find the container
cloud-user@hm6-opshub-aio:~$ docker info | grep -i "Version|runc|containerd|shim"
WARNING: No swap limit support
Server Version: 18.09.9
Runtimes: runc
Default Runtime: runc
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Kernel Version: 4.19.0-041900-generic
cloud-user@hm6-opshub-aio:~$
cloud-user@hm6-opshub-aio:$ docker ps | grep zookeeper-0
7d1dbb28dac3 d43be510ec44 "/custom-entrypoint.…" 7 hours ago Up 7 hours k8s_zookeeper_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_1
fb0855d4e2e6 k8s.gcr.io/pause:3.1 "/pause" 46 hours ago Up 46 hours k8s_POD_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_0
docker inspect hang
cloud-user@hm6-opshub-aio:$ docker inspect 7d1dbb28dac3
^C
cloud-user@hm6-opshub-aio:$
shim and container process still alive:
cloud-user@hm6-opshub-aio:$ ps -ef | grep 7d1dbb28dac3
cloud-u+ 5797 2133 0 00:31 pts/0 00:00:00 grep --color=auto 7d1dbb28dac3
root 20154 1640 0 Nov11 ? 00:00:03 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7d1dbb28dac3216b2b5cbafa5c6b9dbb4a6565e56d9bd8c6f060ba7487b55501 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
cloud-user@hm6-opshub-aio:~$ pstree -Asp 20154
systemd(1)---containerd(1640)---containerd-shim(20154)-+-custom-entrypoi(25587)---java(17248)-+-{java}(17702)
| |-{java}(18065)
| |-{java}(6038)
| |-{java}(14587)
| |-{java}(14793)
| |-{java}(15040)
| |-{java}(19281)
| |-{java}(19301)
| |-{java}(11909)
| |-{java}(12716)
| |-{java}(13990)
| |-{java}(14318)
| |-{java}(15122)
| |-{java}(15182)
| |-{java}(22772)
| |-{java}(26839)
| |-{java}(27524)
| |-{java}(18492)
| |-{java}(18010)
| |-{java}(18022)
| |-{java}(18490)
| |-{java}(20045)
| |-{java}(20113)
| |-{java}(9862)
| |-{java}(7946)
| |-{java}(18670)
| |-{java}(19502)
| |-{java}(19504)
| |-{java}(22857)
| |-{java}(22858)
| |-{java}(23127)
| |-{java}(23178)
| |-{java}(1632)
| |-{java}(1602)
| |-{java}(15553)
| |-{java}(19236)
| |-{java}(22269)
| |-{java}(503)
| |-{java}(18600)
| |-{java}(22767)
| |-{java}(20046)
| |-{java}(20061)
| |-{java}(20129)
| |-{java}(26671)
| |-{java}(17418)
| |-{java}(18604)
| |-{java}(19293)
| |-{java}(19302)
| |-{java}(19487)
| |-{java}(22024)
| |-{java}(7675)
| |-{java}(17292)
| |-{java}(32502)
| |-{java}(18581)
| |-{java}(29362)
| |-{java}(27718)
| |-{java}(29804)
| |-{java}(4248)
| |-{java}(17326)
| |-{java}(17339)
| |-{java}(17350)
| |-{java}(17449)
| |-{java}(1625)
| |-{java}(1638)
| |-{java}(22305)
| |-{java}(19299)
| |-{java}(19314)
| |-{java}(19363)
| |-{java}(13804)
| |-{java}(13848)
| |-{java}(13964)
| |-{java}(14855)
| |-{java}(7284)
| |-{java}(29541)
| `-{java}(30103)
|-{containerd-shim}(20261)
|-{containerd-shim}(20385)
|-{containerd-shim}(20438)
|-{containerd-shim}(20769)
|-{containerd-shim}(21254)
|-{containerd-shim}(21839)
|-{containerd-shim}(22173)
|-{containerd-shim}(22238)
|-{containerd-shim}(30808)
`-{containerd-shim}(3440)
cloud-user@hm6-opshub-aio:~$
@cpuguy83, I am using docker 18.09.8 with containerd 1.2.6. If I want to resolve this issue by upgrading containerd to 1.2.8, can I just replace the containerd binary in my machine?
@zionwu 18.09.8 comes with separate containerd package. You can just download and install latest 1.2.10 rpm/deb from https://download.docker.com/linux/.
You can also just replace the containerd binaries from the latest static package. You will need to replace all binaries except the Docker ones. https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgz