for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

docker ps shows containers which are dead already

Open harshal-shah opened this issue 6 years ago • 25 comments

  • [x] This is a bug report
  • [ ] This is a feature request
  • [ ] I searched existing issues before opening this one

Expected behavior

docker ps should not show containers which have already been killed.

Actual behavior

docker ps shows containers whose PID is already killed. This gets resolved if docker service is restarted.

Steps to reproduce the behavior

We are still not sure when/how this starts to happen but what we see is as follows:

docker ps -a | grep masked-name-import-at-6b65b6ddbd-9nknw
148002af7455        7081d715e0ad                                                                                                                          "/usr/local/bin/ph..."   4 hours ago         Up 4 hours                                      k8s_masked-name-import_masked-name-import-at-6b65b6ddbd-9nknw_default_7b042ec0-1424-11e9-ae7a-0a2f2f061794_39

As we can see docker daemon shows a container is running. Now when we run docker inspect, we see the following:

# docker inspect 148002af7455
[
    {
        "Id": "148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242",
        "Created": "2019-01-11T05:51:02.880386898Z",
        "Path": "/usr/local/bin/php",
        "Args": [
            "/server/http/cli/index.php",
            "--env=staging",
            "--module=experiment",
            "--controller=cli",
            "--action=experiment-import-consumer"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 15552,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2019-01-11T05:51:03.218747444Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },

It can be seen that the container PID is 15552

But this PID does not exist any more

# ps -efa | grep 15552
root     14276  3946  0 10:25 pts/1    00:00:00 grep --color=auto 15552
root@ip-172-23-105-205:/proc# ls -la | grep 15552
root@ip-172-23-105-205:/proc#

So the docker daemon is reporting an incorrect status. Once the daemon is restarted, this behaviour is not seen any more.

Output of docker version:

Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 03:35:14 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 68
 Running: 64
 Paused: 0
 Stopped: 4
Images: 69
Server Version: 17.03.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1065-aws
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67 GiB
Name: ip-172-23-105-205
ID: 4LRV:ZCHB:VDDU:TSGP:2P4Q:M7XE:QUY5:MHOA:NCMO:OIJ4:SPQP:7LQM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

AWS

OS :

NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

harshal-shah avatar Jan 11 '19 10:01 harshal-shah

An additional bit of information: looking for container ID in dockerd logs we see the following messages:

Jan 11 06:51:04 ip-172-23-105-205 dockerd[22916]: time="2019-01-11T06:51:04.084376905Z" level=error msg="containerd: get exit status" error="containerd: process has not exited" id=148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242 pid=6e9b64203251666eee89c25d1322c810e483366b4777ed4f23b36a80f57eca42 systemPid=11646
Jan 11 06:51:10 ip-172-23-105-205 dockerd[22916]: time="2019-01-11T06:51:10.499071121Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242: rpc error: code = 2 desc = containerd: container not found"
Jan 11 06:51:40 ip-172-23-105-205 dockerd[22916]: time="2019-01-11T06:51:40.500079952Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242: rpc error: code = 2 desc = containerd: container not found"
Jan 11 09:08:28 ip-172-23-105-205 dockerd[22916]: time="2019-01-11T09:08:28.483353612Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242: rpc error: code = 2 desc = containerd: container not found"
Jan 11 09:08:58 ip-172-23-105-205 dockerd[22916]: time="2019-01-11T09:08:58.484795898Z" level=warning msg="container kill failed because of 'container not found' or 'no such process': Cannot kill container 148002af7455e538a4b33fd40445f8579df673063b16111547cc300ad3de1242: rpc error: code = 2 desc = containerd: container not found"

harshal-shah avatar Jan 11 '19 10:01 harshal-shah

I have the same issue, how did you resolve it?

VoidNakamura-zz avatar Feb 02 '19 01:02 VoidNakamura-zz

@nerdherdx since we were using kops, we changed our ec2 images from ubuntu xenial to ubuntu bionic and our docker version from 17.03 to 18.06

harshal-shah avatar Feb 02 '19 11:02 harshal-shah

We have experienced the same thing on 17.03.2-ce.

honnix avatar Feb 18 '19 09:02 honnix

@honnix , can you restart the docker daemon. It seems like the daemon is not able to update the state.

VinayKumarKnol avatar May 04 '19 12:05 VinayKumarKnol

@VinayKumarKnol Yeah that did solve the problem but it happens quite often. Do we know whether 18.x got this things fixed?

honnix avatar May 07 '19 08:05 honnix

@honnix We have not faced this problem on docker 18.x

harshal-shah avatar May 07 '19 08:05 harshal-shah

@harshal-shah Good to know. Thank for the confirmation!

honnix avatar May 07 '19 08:05 honnix

Hi @honnix , we experienced similar issue but our docker version is 18.09.3.

hsinhoyeh avatar Jul 30 '19 09:07 hsinhoyeh

Still having the exact described problem on docker 18.09.9-ce on Amazon Linux.

Some more details: The application crashes due to some errors. Docker stays alive, appearing in docker ps and similar commands. Running docker kill or similar returns no error, but container remains in docker ps etc. Running docker restart has no effect.

Only way I've found to solve it is restart the host ( I guess maybe restarting dockerd would have done the job as well).

This is critical, since we are using docker's restart functionality to ensure availability, and since docker doesn't detect the service crashed, no restart occurs.

svarogg avatar Mar 22 '20 11:03 svarogg

I'm facing this problem again in docker 19.03

Client:
 Version:           19.03.6-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        369ce74
 Built:             Fri Mar  6 23:25:53 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       369ce74
  Built:            Fri Mar  6 23:26:25 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.2
  GitCommit:        ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

harshal-shah avatar Apr 03 '20 09:04 harshal-shah

I am also facing the same problem

Client:
 Debug Mode: false

Server:
 Containers: 5
  Running: 5
  Paused: 0
  Stopped: 0
 Images: 15
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.3.0-42-generic
 Operating System: Ubuntu 18.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 125.5GiB
 Name: 3XS-POC10900X
 ID: ZG3T:WERA:JK57:Z5J6:45RP:XYGH:TRKU:BHVJ:UDYZ:BYEC:DYZ2:FQ4R
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

karlem avatar May 13 '20 13:05 karlem

We experience the same problem with Azure, containerd version v1.2.6.

Client: Version: 3.0.7 API version: 1.40 Go version: go1.12.8 Git commit: 578ab52e Built: Wed Oct 2 20:59:32 2019 OS/Arch: linux/amd64 Experimental: false

Server: Engine: Version: 3.0.7 API version: 1.40 (minimum version 1.12) Go version: go1.12.8 Git commit: ed20165 Built: Wed Oct 2 18:42:30 2019 OS/Arch: linux/amd64 Experimental: false containerd: Version: v1.2.6 GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb runc: Version: 1.0.0-rc8 GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f docker-init: Version: 0.18.0 GitCommit: fec3683

SlashKirill avatar Jun 15 '20 05:06 SlashKirill

We are too in our Centos7 local VM

Client: Docker Engine - Community Version: 19.03.12 API version: 1.40 Go version: go1.13.10 Git commit: 48a66213fe Built: Mon Jun 22 15:46:54 2020 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 19.03.12 API version: 1.40 (minimum version 1.12) Go version: go1.13.10 Git commit: 48a66213fe Built: Mon Jun 22 15:45:28 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.13 GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683

cjjb avatar Jul 10 '20 20:07 cjjb

I've also encountered this problem.

docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
4c2199f77395        dcc6f1e61537        "/home/bin/start"        About an hour ago   Up About an hour                        nginx-le
405b41e6833f        mysql:5.7           "docker-entrypoint.s…"   20 hours ago        Up 20 hours                             mysql

docker kill 405b41e6833f
Error response from daemon: Cannot kill container: 405b41e6833f: Container 405b41e6833f85964da6e6265c50755f952240edb4c106cf1b9386889b180080 is not running

sudo service docker restart

docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
405b41e6833f        mysql:5.7           "docker-entrypoint.s…"   20 hours ago        Up 3 seconds                            mysql
docker kill 405b41e6833f
405b41e6833f

docker -v Docker version 19.03.8, build afacb8b7f0

Running on ubuntu 20.04.

bsutton avatar Aug 04 '20 02:08 bsutton

Same problem with Docker version 19.03.12-ce, build 48a66213fe on Archlinux (Kernel 5.8.3).

fafische avatar Aug 28 '20 13:08 fafische

the same too , it's a containerd bug ? https://github.com/containerd/containerd/issues/4547

tiny1990 avatar Sep 10 '20 08:09 tiny1990

With loglevel=debug this lines show up every minute:

dockerd[656]: time="2020-09-10T22:34:47.190957445+02:00" level=debug msg="Calling POST /v1.40/containers/0e9ed449ca51959615a2d74e9c20951b5f769f783f8ad0257e5fdd2450e438ee/kill?signal=URG"
dockerd[656]: time="2020-09-10T22:34:47.191068491+02:00" level=debug msg="Sending kill signal 23 to container 0e9ed449ca51959615a2d74e9c20951b5f769f783f8ad0257e5fdd2450e438ee"
dockerd[656]: time="2020-09-10T22:34:48.198137278+02:00" level=debug msg="container kill failed because of 'container not found' or 'no such process'" action=kill container=0e9ed449ca51959615a2d74e9c20951b5f769f783f8ad0257e5fdd2450e438ee error="process already finished: not found"

fafische avatar Sep 10 '20 20:09 fafische

Same problem with Docker version 18.09.2, build 6247962 I restart dockerd and containerd, but it happens after about 5~6 hours later. root@support:~# docker inspect swinfostatistics | grep Pid "Pid": 17271, "PidMode": "", "PidsLimit": 0, root@support:~# ps -ef | grep 17271 root 59643 89870 0 07:52 pts/1 00:00:00 grep --color=auto 17271 root@support:~# docker -v Docker version 18.09.2, build 6247962

It on ubuntu linux, which is in vmware. Linux support 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

xjplke avatar Sep 18 '20 00:09 xjplke

Bump

rep-movsd avatar Oct 28 '20 19:10 rep-movsd

similar issue as reported above. Docker version 19.03.13, build 4484c46d9d

docker rm "$(docker ps --all --quiet)"

results in (occasionally, not sure under what context):

Docker Error: No such container

The expectation is ps -a should show all existing containers

ghost avatar Nov 14 '20 23:11 ghost

@zoombinis we are seeing the same issue as well with following docker version

docker version
Client:
 Version:           19.03.6-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        369ce74
 Built:             Fri May 29 04:01:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       369ce74
  Built:            Fri May 29 04:01:57 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.0
  GitCommit:        09814d48d50816305a8e6c1a4ae3e2bcc4ba725a
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

harshal-shah avatar Nov 16 '20 10:11 harshal-shah

I'm not sure why this issue keeps getting closed, I am still seeing this in docker 20.10.14. I suspect this is probably a zombie process issue and docker is not properly keeping track of all the containers . I expect you'll see more of this kind of issue if the host is busy enough to reuse PIDs

edmundadjei avatar Oct 13 '22 16:10 edmundadjei

i am also having this problem

centerboy88 avatar Nov 07 '22 10:11 centerboy88

I am experiencing the same issue.

Here is the version of Docker I am currently using:

Client:
 Version:           20.10.23
 API version:       1.41
 Go version:        go1.18.9
 Git commit:        7155243
 Built:             Tue Apr 11 22:56:36 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.23
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.9
  Git commit:       6051f14
  Built:            Tue Apr 11 22:57:17 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.19
  GitCommit:        1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
 runc:
  Version:          1.1.7
  GitCommit:        f19387a6bec4944c770f7668ab51c4348d9c2f38
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

gilbertwong96 avatar Nov 07 '23 06:11 gilbertwong96