for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

Docker inspect hangs forever

Open Routhinator opened this issue 7 years ago • 40 comments

  • [x] This is a bug report
  • [ ] This is a feature request
  • [x] I searched existing issues before opening this one

Expected behavior

Docker inspect returns information, allowing docker-compose to work

Actual behavior

Docker inspect hangs forever and causes an HTTP timeout

Steps to reproduce the behavior

I've been getting this off and on since January of this year. But not on Mac, on Ubuntu. First on 16.04, then on 17.04, 17.10 and now on 18.04 - and my verbose output hangs at the exact same spot as the reports on the docker for mac issue, during container inspection. This is limited to Desktop linux, and some indicators seem to imply this is from the Mac going to sleep, which would line up with my experience. When Ubuntu goes to sleep and comes back, the issue is created. Rebooting the OS seems to be the only workaround, as restarting the docker daemon does not resolve the issue.

Output of docker version:

Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:09:54 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:07:56 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 218
 Running: 0
 Paused: 0
 Stopped: 218
Images: 2
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-30-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.53GiB
Name: crouth-Latitude-5480
ID: PESF:MXJF:NVGR:Q25S:65JB:V4QJ:WIVC:ALQF:4YPF:MXYC:27YK:5YUU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Tested and reproduced on Ubuntu Desktop 16.04, 17.04, 17.10 and 18.04

This is the exact same issue as seen on Mac desktops: https://github.com/docker/for-mac/issues/2378

docker-compose --verbose run playbook -i inventory/global/ playbooks/dev-jasper-reports.yml
compose.config.config.find: Using configuration files: ./docker-compose.yml
docker.utils.config.find_config_file: Trying paths: ['/home/crouth/.docker/config.json', '/home/crouth/.dockercfg']
docker.utils.config.find_config_file: Found file at path: /home/crouth/.docker/config.json
docker.auth.load_config: Found 'auths' section
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/version HTTP/1.1" 200 543
compose.cli.command.get_client: docker-compose version 1.21.2, build a133471
docker-py version: 3.3.0
CPython version: 3.6.5
OpenSSL version: OpenSSL 1.0.1t  3 May 2016
compose.cli.command.get_client: Docker base_url: http+docker://localhost
compose.cli.command.get_client: Docker version: Platform={'Name': ''}, Components=[{'Name': 'Engine', 'Version': '18.06.0-ce', 'Details': {'ApiVersion': '1.38', 'Arch': 'amd64', 'BuildTime': '2018-07-18T19:07:56.000000000+00:00', 'Experimental': 'false', 'GitCommit': '0ffa825', 'GoVersion': 'go1.10.3', 'KernelVersion': '4.15.0-30-generic', 'MinAPIVersion': '1.12', 'Os': 'linux'}}], Version=18.06.0-ce, ApiVersion=1.38, MinAPIVersion=1.12, GitCommit=0ffa825, GoVersion=go1.10.3, Os=linux, Arch=amd64, KernelVersion=4.15.0-30-generic, BuildTime=2018-07-18T19:07:56.000000000+00:00
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreportsservers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreportsservers_default HTTP/1.1" 404 61
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreports-servers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreports-servers_default HTTP/1.1" 200 862
compose.cli.verbose_proxy.proxy_callable: docker inspect_network -> {'Attachable': True,
 'ConfigFrom': {'Network': ''},
 'ConfigOnly': False,
 'Containers': {'35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34': {'EndpointID': '46aaa4306520341b9d1f3f238ceebca2d1aa9f9b00bf9c2b221fa6313439ebc7',
                                                                                     'IPv4Address': '172.18.0.2/16',
                                                                                     'IPv6Address': '',
                                                                                     'MacAddress': '02:42:ac:12:00:02',
                                                                                     'Name': 'jasperreports-servers_playbook_run_112'}},
 'Created': '2018-06-27T13:27:42.910819198-07:00',
 'Driver': 'bridge',
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('dockerhub.alea.ca/docker/ansible')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/images/dockerhub.alea.ca/docker/ansible/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_image -> {'Architecture': 'amd64',
 'Author': '',
 'Comment': '',
 'Config': {'ArgsEscaped': True,
            'AttachStderr': False,
            'AttachStdin': False,
            'AttachStdout': False,
            'Cmd': None,
            'Domainname': '',
            'Entrypoint': ['ansible-playbook'],
...
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=jasperreports-servers', 'com.docker.compose.service=playbook', 'com.docker.compose.oneoff=True']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Djasperreports-servers%22%2C+%22com.docker.compose.service%3Dplaybook%22%2C+%22com.docker.compose.oneoff%3DTrue%22%5D%7D HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 112 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34')
ERROR: compose.cli.errors.log_timeout_error: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Routhinator avatar Aug 13 '18 15:08 Routhinator

Can you grab a stack trace from the daemon?

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2

cpuguy83 avatar Aug 13 '18 16:08 cpuguy83

I have the same problem, here is the stack trace https://pastebin.com/raw/pBNxVRaR

it's getting quite painful and it seems to happen more and more often (hence why I'm starting to look into this)

this is also for a container created by docker-compose, in my case the container did not get started:

docker ps -a -f name=phpunit CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ced5781314bc v1_phpunit "sh" About an hour ago Created v1_phpunit_run_1

and since docker-compose tries to reuse the existing container this service is locked until docker is restarted (which often requires a computer reboot)

mathroc avatar Aug 28 '18 13:08 mathroc

@mathroc Thanks!

Looks like this is stuck waiting on containerd for some reason. Can you pull the stack trace from containerd as well?

Should be able to get it with (assuming you are you using docker to manage containerd):

docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines

cpuguy83 avatar Sep 14 '18 05:09 cpuguy83

@cpuguy83 I have one right now but I don't have the docker-containerd-ctr binary, is there an alternative to that ?

(I installed docker-ce and containerd via portage on gentoo)

mathroc avatar Sep 20 '18 11:09 mathroc

Similar problem here, but using swarm and not compose.

I have a swarm service that is not listed by docker service ls, but the container is listed by docker ps. Trying to inspect it hangs indefinitely.

4144e40b25f5        docker.example.com:5000/admin:2219              "/bin/bash /admin-do…"   6 days ago          Up 6 days (unhealthy)                       admin-service.osn1h8ctn3xeoacs9k42dxzi7.jmdgzie4mnvy475sxytmj5fse
  • Red Hat Enterprise Linux Server release 7.5 (Maipo)
  • Linux [hostname] 3.10.0-862.3.3.el7.x86_64 #1 SMP Wed Jun 13 05:44:23 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Docker version 18.06.0-ce, build 0ffa825

$ docker info

Containers: 11
 Running: 11
 Paused: 0
 Stopped: 0
Images: 13
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: osn1h8ctn3xeoacs9k42dxzi7
 Is Manager: true
 ClusterID: nzrxgungmmv3qr2ftg24htxg4
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.64.225.104
 Manager Addresses:
  10.64.225.104:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.3.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.65GiB
Name: [hostname]
ID: WPK6:BMJC:ADNC:NUNT:VBYK:2VFV:BGLR:F4QV:5RFR:BNJ6:5KLY:N2A7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

$ curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 https://gist.githubusercontent.com/halcyon22/0470948eccb9dd285098e41d1ced4e1a/raw/315dc7863f20ac04ccdce26dc41a29f93cf5a392/docker_pprof.out

$ docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines https://gist.githubusercontent.com/halcyon22/b265cc89ed3fdabb11b1951169c728df/raw/90c00bc52124b7128963270f07dafd8b00a40c22/containerd_pprof.out

halcyon22 avatar Sep 20 '18 12:09 halcyon22

Hello, we have the same problem here, this is a year old version of docker but it might give some extra clues.

One interesting thing in my setup, dockerd has been up for 5 weeks and the containers up for 3-5 week. We have 8 running containers, 3 of which are currently hanging on docker inspect, the others are fine. the 3 containers are running python behind wsgi and some of the fine containers have the same setup.

Here's out info: Docker version

Client:
Version:    17.12.1-ce
API version:    1.35
Go version:    go1.9.4
Git commit:    7390fc6
Built:    Tue Feb 27 22:17:40 2018
OS/Arch:    linux/amd64

Server:
Engine:
 Version:    17.12.1-ce
 API version:    1.35 (minimum version 1.12)
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built:    Tue Feb 27 22:16:13 2018
 OS/Arch:    linux/amd64
 Experimental:    false

docker info

Containers: 9
 Running: 9
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.906GiB
Name: node02
ID: MLF2:BNI4:PH4C:U5K5:F3JD:SY26:QQ75:ULCL:LIVN:4WA5:ZU3U:CVZI
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

docker ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/docker_pprof.out

containerd ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/containerd_pprof.out

Since the container hangs when inspecting but are still running fine, i might let them like this for a while so if you have any extra information i could provide to help find a solution please tell me!

lindycoder avatar Sep 20 '18 20:09 lindycoder

in case it helps here is curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 > docker_pprof.out : https://pastebin.com/raw/sMN1fxZe

mathroc avatar Sep 20 '18 21:09 mathroc

@halcyon22

This looks like it's stuck waiting on a kill from the containerd shim (the process that manages your container process).

goroutine 2339 [select, 9557 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0xc420550200, 0xc420287e40, 0x0, 0x0)
	/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 +0x24c
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0x5577ba0aca84, 0x25, 0x5577ba07fb17, 0x4, 0x5577ba71b5a0, 0xc420287e00, 0x5577ba75e460, ...)
	/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 +0x15d
github.com/containerd/containerd/linux/shim/v1.(*shimClient).Kill(0xc4201b4108, 0x5577ba8012a0, 0xc4206ebe00, 0xc420287e00, 0x5577ba6c5b20, 0x0, 0x5577ba8012a0)
	/go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1801 +0xbf
github.com/containerd/containerd/linux.(*Task).Kill(0xc42046e7e0, 0x5577ba8012a0, 0xc4206ebe00, 0x550000000009, 0xc4206ebe00, 0x5577ba80e5a0)
	/go/src/github.com/containerd/containerd/linux/task.go:183 +0x97
github.com/containerd/containerd/services/tasks.(*local).Kill(0xc42020d500, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0x0, 0x0, 0x0, 0x5577ba80e640, 0xc4206bc958, 0x3)
	/go/src/github.com/containerd/containerd/services/tasks/local.go:349 +0x137
github.com/containerd/containerd/services/tasks.(*service).Kill(0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0xc420393ce0, 0x5577ba0afec8, 0x4)
	/go/src/github.com/containerd/containerd/services/tasks/service.go:100 +0x68
github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler.func1(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc4201b3a90, 0x68, 0x5577ba78cc00, 0xc4206bca00)
	/go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:714 +0x88
github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc420287ac0, 0xc420287b00, 0xc4206bca70, 0x5577b947789a, 0x70, 0x5577ba78cc00)
	/go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xa6
github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler(0x5577ba797d20, 0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4200c3960, 0x5577ba7df250, 0x0, 0x0, 0xc42023350c, 0x0)
	/go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:716 +0x169
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0xc420440c30, 0x5577baea89b0, 0x0, 0x0, 0x0)
	/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 +0x84a
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0x0)
	/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 +0x131a
github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc42003c070, 0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0)
	/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637 +0xa1
created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
	/go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:635 +0xa3

ping @crosbymichael @dmcgowan on this one.

@mathroc Any chance you can get the containerd output as well? I'm not sure how docker is packaged on Gentoo... maybe just a ctr binary is available, in which case sockets and such may be in different locations.

@lindycoder There are some known deadlocks in containerd and docker in 17.12, can you upgrade to a newer version? 18.06 is the most recent release.

cpuguy83 avatar Sep 20 '18 21:09 cpuguy83

@cpuguy83 you're right there's a ctr bianry, here is the output : https://pastebin.com/raw/FpGwmfq7

mathroc avatar Sep 21 '18 07:09 mathroc

Thanks @cpuguy83 - with that information I stopped the Docker service, killed containerd, brought the service back up again, and the problem didn't reappear. I thought I had rebooted the server prior to adding my comment, but maybe that hadn't worked.

halcyon22 avatar Sep 27 '18 20:09 halcyon22

I'm seeing the same issue here. Right now I'm trying to update all systems to the most resent version and check if the issue is resolved ...

mhubig avatar Nov 13 '18 16:11 mhubig

Same problem on up to date docker version. As a possible solution you can locate process related to problem container by it's id and kill them

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:58 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:24 2018
  OS/Arch:          linux/amd64
  Experimental:     false
Containers: 10
 Running: 7
 Paused: 0
 Stopped: 3
Images: 82
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
Kernel Version: 4.4.0-141-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.78GiB
Name: *
ID: I6UV:RJTS:L2XD:3QRG:OW7C:HVBG:34GL:IBFP:LXE7:XX4Q:3W6A:JZAT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

whalebot-helmsman avatar Feb 07 '19 13:02 whalebot-helmsman

We experienced the exact same issue with certain containers of ours. Same issue with various docker versions from version 18.03 to 18.09.2 and various kernels from CentOS 7.2 to 7.6. We ended up finding out that the issue for us only occurred with containers that utilized NFS type docker volumes, and our NFS daemon somehow got into a funky state where mounts would hang indefinitely. Restarting the NFS daemon fixed our issue and mounts succeeded again. Probably docker needs to have a timeout on NFS mount attempts and allow users to do things like delete/inspect containers and remove docker volumes for misbehaving NFS mounts.

Would be curious to know if other people on this thread experiencing this issue are using NFS volumes as well, or if there are separate root causes for this issue.

calebmayeux avatar Feb 12 '19 19:02 calebmayeux

We got this problem on 18.06.1-ce, no NFS used, using docker-compose

wimfabri avatar May 02 '19 09:05 wimfabri

We're also having this problem, 18.06.3-ce, using Nomad 0.8.7 on Ubuntu 14.04. Two instances exhibited this problem within minutes of one another a week ago, and another today.

dockerd-pprof.log docker-containerd-ctr.log

docker ps shows

root@nomad-client-blue-10-181-2-40:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                      NAMES
59ef6eb37a06        8df826da855b        "/usr/bin/init.sh bu…"   29 minutes ago      Up 29 minutes       10.181.2.40:22386->9292/tcp, 10.181.2.40:22386->9292/udp   frontend-9fd92404-4aaf-0dd1-26e9-84c74e5e3ddd
b97162768a9f        1b6c25ab9e92        "/usr/bin/init.sh bu…"   11 hours ago        Up 11 hours         10.181.2.40:25345->9292/tcp, 10.181.2.40:25345->9292/udp   frontend-b094a318-4ce3-4f37-bb63-72be91cedb0f

and the process tree shows:

 1055 ?        Ssl  116:04 /usr/bin/dockerd --raw-logs
 1978 ?        Ssl  196:19  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
12136 ?        Sl     0:01  |   \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/b97162768a9f54d0a289cd50
11069 ?        Sl     0:00  |   \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/59ef6eb37a0661c4386f66a0
11087 ?        Ssl    0:00  |       \_ puma 3.12.0 (tcp://0.0.0.0:9292) [app]
11153 ?        Sl     3:41  |           \_ …
12117 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
12130 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
11047 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
11061 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292

Note that docker-containerd-shim for b97162768a9f has no child processes. I can docker inspect 59ef6eb37a06, but docker inspect b97162768a9f hangs.

A simple kill 12136 (of the docker-containerd-shim process) doesn't work, I have to kill -9. Even afterwards, docker inspect b97162768a9f still hangs. Only after I stop and start the docker service am I able to inspect both containers.

The output from docker-containerd-ctr shows a couple of goroutines that have been alive from the time the "stuck" container was asked to stop:

goroutine 36642793 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4205569c0)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147

goroutine 36642794 [IO wait, 26 minutes]:
internal/poll.runtime_pollWait(0x7fb457e2c950, 0x72, 0xc42064cb80)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4200eab18, 0x72, 0xffffffffffffff00, 0x558dcff4ef20, 0x558dd0615a98)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc4200eab18, 0xc420df3000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
net.(*netFD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0xc42064cc68, 0x558dcec21010, 0xc420502a80)
    /usr/local/go/src/net/fd_unix.go:202 +0x51
net.(*conn).Read(0xc4209141b0, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6c
bufio.(*Reader).Read(0xc420556720, 0xc4209129e0, 0xa, 0xa, 0x0, 0x1, 0x0)
    /usr/local/go/src/bufio/bufio.go:216 +0x23a
io.ReadAtLeast(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xa, 0xc420425a00, 0x0, 0xc420870b40)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xc42064cf48, 0x2, 0x2)
    /usr/local/go/src/io/io.go:327 +0x5a
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4209129e0, 0xa, 0xa, 0x558dcff4ba60, 0xc420556720, 0xc42064cf48, 0x2, 0x2, 0xc42064cfa8)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38 +0x62
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc4209129c0, 0x558dcff65b40, 0xc420040080, 0x0, 0xc420590002, 0xc4206ea1c0, 0x1b8, 0x1b8, 0x0, 0x0)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86 +0x6f
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run.func1(0xc420870ba0, 0xc4205569c0, 0xc4208da170, 0xc420870b40)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:144 +0x90
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:139 +0x13f

goroutine 36642792 [syscall, 26 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x2b3d, 0xc4202afde0, 0x1000004, 0x0, 0x0, 0x558dcf06504a, 0xc4200e22c0, 0x558dcff6cea0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420b08510, 0x558dcfd29be0, 0xc4208da030, 0x0)
    /usr/local/go/src/os/wait_waitid.go:31 +0x9a
os.(*Process).wait(0xc420b08510, 0x558dcff65bc0, 0xc420c7b350, 0x0)
    /usr/local/go/src/os/exec_unix.go:22 +0x3e
os.(*Process).Wait(0xc420b08510, 0x48, 0xc4202aff70, 0x558dcebf37fb)
    /usr/local/go/src/os/exec.go:123 +0x2d
os/exec.(*Cmd).Wait(0xc4205f8840, 0x558dcec25191, 0xc4202d4210)
    /usr/local/go/src/os/exec/exec.go:461 +0x5e
github.com/containerd/containerd/linux/shim/client.WithStart.func1.2(0xc4205f8840, 0xc420e2f220)
    /go/src/github.com/containerd/containerd/linux/shim/client/client.go:78 +0x2d
created by github.com/containerd/containerd/linux/shim/client.WithStart.func1
    /go/src/github.com/containerd/containerd/linux/shim/client/client.go:77 +0x307

goroutine 34853989 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4204e7320)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147

blalor avatar May 03 '19 15:05 blalor

I believe we are seeing this as well. It seems to be tied to situations where containers are getting oomkilled or otherwise are in restart loops

I see two long running calls to containerd:

goroutine 1374264 [select, 689 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc42a1f42d0, 0x10, 0xc423b152f0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc42a1f42d0, 0x555558c3fbb0, 0xc423b153b0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42a1f8340, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc42a1f6280, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc42a5c70e0, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0xc4208f4900, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6
github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x555558c3fc20, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0x0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b
github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Get(0xc4247840c0, 0x555558c74600, 0xc42003e018, 0xc42c811220, 0x0, 0x0, 0x0, 0xc426ebf7b8, 0x555556dde1ff, 0x555558c74680)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:448 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Status(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:220 +0x119
github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Delete(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x40, 0xc42c097680, 0x40)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:196 +0x119
github.com/docker/docker/libcontainerd.(*client).processEvent.func1()
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:713 +0x212
github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc42c097601, 0xc42a7b0420, 0xc421f6ba00, 0xc42a6483c0, 0xc4202f46e8, 0xc42c097640, 0x40)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3a
created by github.com/docker/docker/libcontainerd.(*queue).append
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x196
...
goroutine 1377768 [select, 688 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc426a2ec30, 0x10, 0xc42a3b2bd0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc426a2ec30, 0x555558c3fbb0, 0xc42a3b2c90)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42247f380, 0x555558b22200, 0x55555986b588, 0x0, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc420408300, 0x555558b22200, 0x55555986b588, 0x0, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc426217a40, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0xc4208f4900, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6
github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x555558c3fc20, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0x0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b
github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Exec(0xc4285e0ae8, 0x555558c74600, 0xc42003e018, 0xc428567140, 0x0, 0x0, 0x0, 0xa3, 0xc428566f60, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:475 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Exec(0xc423993e90, 0x555558c74600, 0xc42003e018, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0xc428566f60, 0x0, 0x0, 0x0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:333 +0x2ba
github.com/docker/docker/libcontainerd.(*client).Exec(0xc4202f4690, 0x555558c74600, 0xc42003e018, 0xc421cdef40, 0x40, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0x0, 0xc4234531f0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:367 +0x3f1
github.com/docker/docker/daemon.(*Daemon).ContainerExecStart(0xc4204828c0, 0x555558c74600, 0xc42003e018, 0xc4228250f1, 0x40, 0x7ffff7f0f370, 0xc42093f0c0, 0x555558c510a0, 0xc42093f280, 0x555558c510a0, ...)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 +0xb4f
github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart(0xc420d71fc0, 0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0x0)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 +0x34b
github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555556bcee3c, 0x555558aba440)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 +0x6b
github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77080)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 +0xda
github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0xc42b14ba58)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 +0x401
github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77050)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 +0x7ab
github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1(0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 +0x19a
net/http.HandlerFunc.ServeHTTP(0xc4212fed40, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
    /usr/local/go/src/net/http/server.go:1947 +0x46
github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc4217185f0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 +0x228
github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0xc420edc910, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
    /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x72
net/http.serverHandler.ServeHTTP(0xc4204365b0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00)
    /usr/local/go/src/net/http/server.go:2694 +0xbe
net/http.(*conn).serve(0xc420b67ea0, 0x555558c745c0, 0xc426d72a40)
    /usr/local/go/src/net/http/server.go:1830 +0x653
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:2795 +0x27d
docker info:

Containers: 206
 Running: 165
 Paused: 0
 Stopped: 41
Images: 112
Server Version: 18.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.6.3.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 503.6GiB
Docker Root Dir: /mongodb/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Seem like the same issue @cpuguy83? any further guidance?

cjellick avatar May 21 '19 01:05 cjellick

This seems closely related to https://github.com/moby/moby/issues/38064

cjellick avatar May 21 '19 02:05 cjellick

docker 18.09-ce also have this problem

timchenxiaoyu avatar Jun 26 '19 03:06 timchenxiaoyu

Seems like the bug is also present in 19.03.1

~# docker version
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:22:03 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:20:35 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Stack trace: https://plik.himbeere.pw/file/yw4XSErMAEMxG9wD/wUQJrxyVYWsRGzhp/stacktrace

Example:

# time timeout 10 docker rm -f dev_admin_php-fpm_1

real    0m10.004s
user    0m0.011s
sys     0m0.024s
# time timeout 10 docker inspect dev_admin_php-fpm_1

real    0m10.008s
user    0m0.013s
sys     0m0.024s

Restarting the docker daemon fixed it.

ghost avatar Jul 26 '19 12:07 ghost

I can confirm that this issue is present on 19.03.1 on Ubuntu 18.04.2 LTS.

Does anyone have a workaround that only impacts the offending container/service? I have only been able to fix this issue by rebooting the server, as the system hangs when trying to restart the docker daemon (sudo systemctl restart docker, sudo service docker start).

This is naturally not very good for the uptime of my services :(

The machine should have plenty resources, memory is around 5-10% and system load is usually way low as well.

glaux avatar Aug 09 '19 13:08 glaux

Happening with extremely high frequency to me on multiple ubuntu machines running 19.03.1. docker inspect is an essential part of the scripts that manage my short-lived containers, so this is a breaking issue for me. My last heavy use of docker was on 18.09.x, and I never saw this issue across tens of thousands of containers and many machines.

No NFS, not using docker compose, and the containers that inspect hangs on were not otherwise misbehaving.

edit to add: In an attempt to work around the issue, I made a shell script with permissions to read the docker/containers folder and manually pull fields that I need from the config.v2.json file. Somewhat worried about reading it while it's in the process of being rewritten or edited, but that's the only way I could think to eliminate calls to inspect. But...it turns out that I have other breaking issues with 19.03.1 and have reverted to 18.09

buck2202 avatar Aug 14 '19 05:08 buck2202

@luminouspanda if you can put your stack trace in a site not flagged as a security issue, such as gist, then we can review it.

Everyone else experiencing problems on 19.03, there seems to be a regression (At least impirically based on number of reports I've seen). Can you please post the stack trace from effected systems?

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2

cpuguy83 avatar Aug 14 '19 18:08 cpuguy83

@cpuguy83 : not sure if this will be useful, since like I said, I have other issues with 19.03 (will create a new bug report...but, in short, docker never registers that a container has exited after its referenced PID quits). But, including anyway this in the hope that it will be useful

docker version (experimental is enabled because I use checkpoints):

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

docker info:

Client:
 Debug Mode: false

Server:
 Containers: 7
  Running: 7
  Paused: 0
  Stopped: 0
 Images: 10
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.0.0-25-generic
 Operating System: Linux Mint 19
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.37GiB
 Name: thinkpad
 ID: W7ON:HPJN:2SGV:LS7L:VBOZ:RTGK:JN46:HTF7:N4KW:H4PU:24AH:E3OK
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

WARNING: No swap limit support

output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 with a hung docker container inspect: https://gist.github.com/buck2202/8058fda098f992c4a35bc181f27bcbf3

buck2202 avatar Aug 14 '19 23:08 buck2202

The issue seems more correlated with the time of day than anything else, the last three days I've seen it around 3:30 pm but that must surely be a coincidence?

Anyway, here are the output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2:

https://gist.github.com/glaux/6562f669885929dcb26ef668d2fcb71c

This time I had run composer require in a php container when I got an error about memory exhaustion (I have only a few containers running on a small dev server):

Fatal error: Allowed memory size of 1610612736 bytes exhausted (tried to allocate 4096 bytes) in phar:///usr/local/bin/composer/src/Composer/DependencyResolver/Solver.php on line 223

Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.

When running docker-compose up I get the message:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

That particular error is discussed here: https://github.com/docker/compose/issues/5620 which links to this thread.

Docker version:

$ docker version
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

glaux avatar Aug 15 '19 14:08 glaux

This looks like a deadlock waiting on data back from containerd. I know I've seen some things come across regarding this.

Unfortunately the debug socket for containerd is no longer available from a default docker install so tracking down the cause is a bit harder :(

goroutine 562400 [select, 3 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc001a08000, 0x8, 0xc00134b880)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:267 +0xce
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:278
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc00134b880, 0x55b7e354e780, 0xc0013adf40, 0x0, 0xc00258cae0, 0x0)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:856 +0x70e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc00134b880, 0x0, 0x0)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:721 +0x48
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000e73320, 0xc00258cae8, 0xc00258cad8, 0xc001a87810, 0x0)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:577 +0x7e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc000e73320, 0x55b7e354e780, 0xc0013adf40, 0x0, 0x0)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:720 +0x102
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x55b7e3705060, 0xc002cc0030, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0xc000176dc0, 0xc000891520, ...)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:73 +0x143
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55b7e25f7f49, 0x4, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, ...)
	/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf4
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc000176dc0, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0x0, ...)
	/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10c
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Delete(0xc000db26a8, 0x55b7e3704fe0, 0xc000052038, 0xc00269df80, 0x0, 0x0, 0x0, 0xed4e7589b, 0x0, 0x0)
	/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:1315 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Delete(0xc000a12570, 0x55b7e3704fe0, 0xc000052038, 0x0, 0x0, 0x0, 0x4, 0x55b7e373b2e0, 0xc000a12570)
	/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:312 +0x221
github.com/docker/docker/libcontainerd/remote.(*client).DeleteTask(0xc0008888a0, 0x55b7e3704fe0, 0xc000052038, 0xc000d236c0, 0x40, 0x0, 0x55b7e32203e0, 0x55b7e0ce4275, 0x196, 0xc00048200b, ...)
	/go/src/github.com/docker/docker/libcontainerd/remote/client.go:438 +0xd6
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00000c5a0, 0xc001a40b80, 0x40, 0x55b7e25f7d1d, 0x4, 0xc001a40b80, 0x40, 0xc001a40bc0, 0x40, 0x3b6b, ...)
	/go/src/github.com/docker/docker/daemon/monitor.go:54 +0x253
github.com/docker/docker/libcontainerd/remote.(*client).processEvent.func1()
	/go/src/github.com/docker/docker/libcontainerd/remote/client.go:634 +0x109
github.com/docker/docker/libcontainerd/queue.(*Queue).Append.func1(0xc001a40b00, 0x0, 0xc001514480, 0xc0016fbc80, 0xc0008888e0, 0xc001a40b80, 0x40)
	/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:28 +0x3a
created by github.com/docker/docker/libcontainerd/queue.(*Queue).Append
	/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:24 +0x186

cpuguy83 avatar Aug 15 '19 17:08 cpuguy83

https://github.com/containerd/containerd/pull/3540 seems promising.

cpuguy83 avatar Aug 15 '19 21:08 cpuguy83

Still testing, but it seems like the combination of docker 19.03.3 and containerd.io 1.2.10-2 may have resolved this issue for me.

The PR @cpuguy83 referenced above (containerd/containerd#3540) was backported to containerd 1.2.8 by containerd/containerd#3561, but as far as I can tell, the download.docker.com repository for bionic wasn't updated to include this patch until 1.2.10-2 within the past week or so.

buck2202 avatar Oct 17 '19 20:10 buck2202

We also met the similar issue on 18.09.9 docker-ce with containerd 1.2.10 and kernel is 4.19.0-041900-generic. Not all containers but some of them will hit the hang issue.

The container process and shim process are alive as pstree shows but the docker inspect/exec/stop/kill hang when docker ps can still find the container.

docker ps can still find the container

cloud-user@hm6-opshub-aio:~$ docker info | grep -i "Version|runc|containerd|shim"
WARNING: No swap limit support
Server Version: 18.09.9
Runtimes: runc
Default Runtime: runc
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Kernel Version: 4.19.0-041900-generic
cloud-user@hm6-opshub-aio:~$
cloud-user@hm6-opshub-aio:$ docker ps | grep zookeeper-0
7d1dbb28dac3 d43be510ec44 "/custom-entrypoint.…" 7 hours ago Up 7 hours k8s_zookeeper_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_1
fb0855d4e2e6 k8s.gcr.io/pause:3.1 "/pause" 46 hours ago Up 46 hours k8s_POD_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_0

docker inspect hang

cloud-user@hm6-opshub-aio:$ docker inspect 7d1dbb28dac3
^C
cloud-user@hm6-opshub-aio:$

shim and container process still alive:

cloud-user@hm6-opshub-aio:$ ps -ef | grep 7d1dbb28dac3
cloud-u+ 5797 2133 0 00:31 pts/0 00:00:00 grep --color=auto 7d1dbb28dac3
root 20154 1640 0 Nov11 ? 00:00:03 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7d1dbb28dac3216b2b5cbafa5c6b9dbb4a6565e56d9bd8c6f060ba7487b55501 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
cloud-user@hm6-opshub-aio:~$ pstree -Asp 20154
systemd(1)---containerd(1640)---containerd-shim(20154)-+-custom-entrypoi(25587)---java(17248)-+-{java}(17702)
                                                       |                                      |-{java}(18065)
                                                       |                                      |-{java}(6038)
                                                       |                                      |-{java}(14587)
                                                       |                                      |-{java}(14793)
                                                       |                                      |-{java}(15040)
                                                       |                                      |-{java}(19281)
                                                       |                                      |-{java}(19301)
                                                       |                                      |-{java}(11909)
                                                       |                                      |-{java}(12716)
                                                       |                                      |-{java}(13990)
                                                       |                                      |-{java}(14318)
                                                       |                                      |-{java}(15122)
                                                       |                                      |-{java}(15182)
                                                       |                                      |-{java}(22772)
                                                       |                                      |-{java}(26839)
                                                       |                                      |-{java}(27524)
                                                       |                                      |-{java}(18492)
                                                       |                                      |-{java}(18010)
                                                       |                                      |-{java}(18022)
                                                       |                                      |-{java}(18490)
                                                       |                                      |-{java}(20045)
                                                       |                                      |-{java}(20113)
                                                       |                                      |-{java}(9862)
                                                       |                                      |-{java}(7946)
                                                       |                                      |-{java}(18670)
                                                       |                                      |-{java}(19502)
                                                       |                                      |-{java}(19504)
                                                       |                                      |-{java}(22857)
                                                       |                                      |-{java}(22858)
                                                       |                                      |-{java}(23127)
                                                       |                                      |-{java}(23178)
                                                       |                                      |-{java}(1632)
                                                       |                                      |-{java}(1602)
                                                       |                                      |-{java}(15553)
                                                       |                                      |-{java}(19236)
                                                       |                                      |-{java}(22269)
                                                       |                                      |-{java}(503)
                                                       |                                      |-{java}(18600)
                                                       |                                      |-{java}(22767)
                                                       |                                      |-{java}(20046)
                                                       |                                      |-{java}(20061)
                                                       |                                      |-{java}(20129)
                                                       |                                      |-{java}(26671)
                                                       |                                      |-{java}(17418)
                                                       |                                      |-{java}(18604)
                                                       |                                      |-{java}(19293)
                                                       |                                      |-{java}(19302)
                                                       |                                      |-{java}(19487)
                                                       |                                      |-{java}(22024)
                                                       |                                      |-{java}(7675)
                                                       |                                      |-{java}(17292)
                                                       |                                      |-{java}(32502)
                                                       |                                      |-{java}(18581)
                                                       |                                      |-{java}(29362)
                                                       |                                      |-{java}(27718)
                                                       |                                      |-{java}(29804)
                                                       |                                      |-{java}(4248)
                                                       |                                      |-{java}(17326)
                                                       |                                      |-{java}(17339)
                                                       |                                      |-{java}(17350)
                                                       |                                      |-{java}(17449)
                                                       |                                      |-{java}(1625)
                                                       |                                      |-{java}(1638)
                                                       |                                      |-{java}(22305)
                                                       |                                      |-{java}(19299)
                                                       |                                      |-{java}(19314)
                                                       |                                      |-{java}(19363)
                                                       |                                      |-{java}(13804)
                                                       |                                      |-{java}(13848)
                                                       |                                      |-{java}(13964)
                                                       |                                      |-{java}(14855)
                                                       |                                      |-{java}(7284)
                                                       |                                      |-{java}(29541)
                                                       |                                      `-{java}(30103)
                                                       |-{containerd-shim}(20261)
                                                       |-{containerd-shim}(20385)
                                                       |-{containerd-shim}(20438)
                                                       |-{containerd-shim}(20769)
                                                       |-{containerd-shim}(21254)
                                                       |-{containerd-shim}(21839)
                                                       |-{containerd-shim}(22173)
                                                       |-{containerd-shim}(22238)
                                                       |-{containerd-shim}(30808)
                                                       `-{containerd-shim}(3440)
cloud-user@hm6-opshub-aio:~$ 

frankui avatar Nov 13 '19 00:11 frankui

@cpuguy83, I am using docker 18.09.8 with containerd 1.2.6. If I want to resolve this issue by upgrading containerd to 1.2.8, can I just replace the containerd binary in my machine?

zionwu avatar Dec 05 '19 08:12 zionwu

@zionwu 18.09.8 comes with separate containerd package. You can just download and install latest 1.2.10 rpm/deb from https://download.docker.com/linux/.

You can also just replace the containerd binaries from the latest static package. You will need to replace all binaries except the Docker ones. https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgz

hakman avatar Dec 05 '19 08:12 hakman