dcos-e2e
dcos-e2e copied to clipboard
There was an unknown error when performing a doctor check.
After I installed minidcos on my most actual CentOS 7.5
(uname -a: Linux <server> 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux)
with the command for the Linux package:
sudo curl --fail -L https://github.com/dcos/dcos-e2e/releases/download/2018.12.10.0/minidcos -o /usr/local/bin/minidcos && sudo chmod +x /usr/local/bin/minidcos
I also changed the owner of the directory and the file with:
sudo chown myuser:myuser
... as I have to use sudo on my sytems to install packages.
After installation has been finished, I executed the command minidcos docker doctor -v
All checks went well until 13/13. This gave me the following error:
Note: Docker has approximately 31.5 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple clusters requires a lot of memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.
12/13 checks complete:
2018-12-23 17:54:22 ERROR dcos_e2e._common | docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
ERROR:dcos_e2e._common:docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
2018-12-23 17:54:22 ERROR dcos_e2e._common | See 'docker run --help'.
ERROR:dcos_e2e._common:See 'docker run --help'.
Error: There was an unknown error when performing a doctor check.
The doctor function was "_check_can_mount_in_docker".
The error was: "Command '['docker', 'exec', '--user', 'root', '--interactive', '53adb21fa73f33c64b2d63ab63b28e97096a251f2c25d719cf4944b44d54979b', 'docker', 'run', '-v', '/foo', 'alpine']' returned non-zero exit status 125.".
12/13 checks complete: Exception ignored in: <bound method tqdm.__del__ of 12/13 checks complete: ▏ >
Traceback (most recent call last):
File "site-packages/tqdm/_tqdm.py", line 931, in __del__
File "site-packages/tqdm/_tqdm.py", line 1133, in close
File "site-packages/tqdm/_tqdm.py", line 496, in _decr_instances
File "site-packages/tqdm/_monitor.py", line 52, in exit
File "threading.py", line 1053, in join
Actually the docker daemon is up and running:
$ sudo systemctl status docker
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor prese t: disabled)
Active: active (running) since So 2018-12-23 17:32:45 CET; 18min ago
Docs: https://docs.docker.com
Main PID: 15404 (dockerd)
CGroup: /system.slice/docker.service
├─15404 /usr/bin/dockerd -H unix://
├─15427 containerd --config /var/run/docker/containerd/containerd....
├─16403 containerd-shim -namespace moby -workdir /var/lib/docker/c...
├─24416 containerd-shim -namespace moby -workdir /var/lib/docker/c...
├─24515 containerd-shim -namespace moby -workdir /var/lib/docker/c...
├─25318 containerd-shim -namespace moby -workdir /var/lib/docker/c...
├─26123 containerd-shim -namespace moby -workdir /var/lib/docker/c...
└─26898 runc --root /var/run/docker/runtime-runc/moby --log /run/d...
Dez 23 17:50:25 little dockerd[15404]: time="2018-12-23T17:50:25.352687757+...1d
Dez 23 17:50:25 little dockerd[15404]: time="2018-12-23T17:50:25.362184302+...e"
Dez 23 17:50:26 little dockerd[15404]: time="2018-12-23T17:50:26.522587501+...58
Dez 23 17:50:37 little dockerd[15404]: time="2018-12-23T17:50:37.994556673+...e"
Dez 23 17:50:38 little dockerd[15404]: time="2018-12-23T17:50:38.413301781+...7b
Dez 23 17:50:38 little dockerd[15404]: time="2018-12-23T17:50:38.422892823+...e"
Dez 23 17:50:39 little dockerd[15404]: time="2018-12-23T17:50:39.389365804+...16
Dez 23 17:50:42 little dockerd[15404]: time="2018-12-23T17:50:42.570627795+...15
Dez 23 17:50:50 little dockerd[15404]: time="2018-12-23T17:50:50.697512712+...18
Dez 23 17:50:59 little dockerd[15404]: time="2018-12-23T17:50:58.995491606+...23
Hint: Some lines were ellipsized, use -l to show in full.
I tried the same with starting the docker daemon manually:
$ sudo dockerd
INFO[2018-12-23T19:16:39.403460151+01:00] parsed scheme: "unix" module=grpc
INFO[2018-12-23T19:16:39.404589503+01:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2018-12-23T19:16:39.404830250+01:00] parsed scheme: "unix" module=grpc
INFO[2018-12-23T19:16:39.404916296+01:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2018-12-23T19:16:39.405424749+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}] module=grpc
INFO[2018-12-23T19:16:39.405605361+01:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2018-12-23T19:16:39.405818241+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42076cad0, CONNECTING module=grpc
INFO[2018-12-23T19:16:39.407404153+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}] module=grpc
INFO[2018-12-23T19:16:39.410009554+01:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2018-12-23T19:16:39.410287038+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420886190, CONNECTING module=grpc
INFO[2018-12-23T19:16:39.408256998+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc42076cad0, READY module=grpc
INFO[2018-12-23T19:16:39.411416250+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420886190, READY module=grpc
INFO[2018-12-23T19:16:39.462749589+01:00] [graphdriver] using prior storage driver: overlay2
INFO[2018-12-23T19:16:39.545900130+01:00] Graph migration to content-addressability took 0.00 seconds
INFO[2018-12-23T19:16:39.549423401+01:00] Loading containers: start.
INFO[2018-12-23T19:16:41.842242207+01:00] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address
INFO[2018-12-23T19:16:42.713012048+01:00] Loading containers: done.
INFO[2018-12-23T19:16:42.831660976+01:00] Docker daemon commit=4d60db4 graphdriver(s)=overlay2 version=18.09.0
INFO[2018-12-23T19:16:42.832064665+01:00] Daemon has completed initialization
INFO[2018-12-23T19:16:42.886521566+01:00] API listen on /var/run/docker.sock
Then I retried:
$ pwd
/usr/local/bin
[myuser@server bin]$ ./minidcos docker doctor
on "dockerd" terminal it said:
INFO[2018-12-23T19:17:47.437329516+01:00] Container 6faf2e38300319c5fe2c18ef8758d771ad496ac1930e96fbaa32870894bfadf0 failed to exit within 10 seconds of signal 15 - using the force
INFO[2018-12-23T19:17:47.899840140+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:18:00.165255179+01:00] Container 6171b37f290a7265cbae3e1cf388da5aa88269b5cd8bb58df867a05f5af8203f failed to exit within 10 seconds of signal 15 - using the force
INFO[2018-12-23T19:18:00.614194441+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:18:12.738779801+01:00] Container faccf28f1779cfb1bc814dd836b58c238e44228458d6f9a40d1c746734009c2f failed to exit within 10 seconds of signal 15 - using the force
INFO[2018-12-23T19:18:13.189271409+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:18:25.333069615+01:00] Container c85462287ccb5f9063ff351d26043cd8b9c6d7037583f7c2359bfb8c76b1742a failed to exit within 10 seconds of signal 15 - using the force
INFO[2018-12-23T19:18:25.782169752+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:18:38.035307969+01:00] Container 0831d004a297ef2377b6272b199bd98114db1ee157dd98c671d153b14c93b5e9 failed to exit within 10 seconds of signal 15 - using the force
INFO[2018-12-23T19:18:38.498353545+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:07.051312557+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:08.542774254+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:09.774780657+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:36.829187350+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:37.939741998+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2018-12-23T19:19:39.307944482+01:00] ignoring event module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
and the output in the "minidcos" terminal still was:
Note: Docker has approximately 31.5 GB of memory available. The amount of memory required depends on the workload. memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.
12/13 checks complete:
2018-12-23 19:19:36 ERROR dcos_e2e._common | docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
ERROR:dcos_e2e._common:docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
2018-12-23 19:19:36 ERROR dcos_e2e._common | See 'docker run --help'.
ERROR:dcos_e2e._common:See 'docker run --help'.
Error: There was an unknown error when performing a doctor check.
The doctor function was "_check_can_mount_in_docker".
The error was: "Command '['docker', 'exec', '--user', 'root', '--interactive', '20425e8f106830d9b7b4157b434fe698af42ac67a9f7a6d95a704fb924a71fe5', 'docker', 'run', '-v', '/foo', 'alpine']' returned non-zero exit status 125.".
12/13 checks complete: Exception ignored in: <bound method tqdm.__del__ of 12/13 checks complete: >
Traceback (most recent call last):
File "site-packages/tqdm/_tqdm.py", line 931, in __del__
File "site-packages/tqdm/_tqdm.py", line 1133, in close
File "site-packages/tqdm/_tqdm.py", line 496, in _decr_instances
File "site-packages/tqdm/_monitor.py", line 52, in exit
File "threading.py", line 1053, in join
But a:
$ ls -l /var/run/docker.sock
srw-rw----. 1 root docker 0 23. Dez 19:16 /var/run/docker.sock
seems to proof, that the docker socket is available.
The current user is member of the usergroup "docker" I can execute any docker command without "sudo" upfront. I tried the same commands while being root, but nothing changed.
P.S.:
$ docker version
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:19:08 2018
OS/Arch: linux/amd64
Experimental: false
Do we have any updates on this? I suffered this issue in CentOS7 too. Could directly run docker commands, while failed in minidcos docker doctor command. Appreciate if any kindly help, thanks.
I could successfully run create action with specifying docker version, command as:
minidcos docker create --docker-version 17.12.1-ce ./dcos_generate_config.sh --agents 0
ref to https://github.com/dcos/dcos-e2e/issues/1252#issuecomment-409343923
although minidcos docker doctor
still failed, the cluster seems works well.
full workround to successfully visit mesosphere web ui page, the command should be run with non-root user:
minidcos docker create --docker-version 17.12.1-ce ./dcos_generate_config.sh --agents 0
minidcos docker wait --cluster-id default
minidcos docker web
Exact same issue with me too.
Host OS: CentOS Linux release 7.6.1810 (Core)
Docker Version: 18.09.1
$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 ජන 12 10:07 /var/run/docker.sock
minidcos --version
minidcos, version 2019.01.10.0
Just like @dennyx said, I also can create and setup the cluster. But still minidcos docker doctor fails.
It would be great if I can get some help with this?
Update: Tried downgrading docker to docker 17.12.1-ce but no joy!
I cannot get miniDC/OS release 2019.05.03.0 to work on CentOS 7 in Docker. However, I can successfully create a cluster with release 2019.05.23.1, CentOS Linux release 7.6.1810, Docker version 18.09.6. However, I do need to specify the Docker version, and minidcos docker doctor
still fails.
To create the cluster:
$ minidcos docker create --variant oss --agents 1 --cluster-id default --docker-version 17.12.1-ce dcos_generate_config.sh
$ minidcos docker wait --cluster-id default
Thanks all for your contributions to this thread. It is interesting and it includes multiple issues:
- Ugly traceback in an error message
Traceback (most recent call last):
File "site-packages/tqdm/_tqdm.py", line 931, in __del__
File "site-packages/tqdm/_tqdm.py", line 1133, in close
File "site-packages/tqdm/_tqdm.py", line 496, in _decr_instances
File "site-packages/tqdm/_monitor.py", line 52, in exit
File "threading.py", line 1053, in join
This traceback should no longer be shown, since an update to tqdm
a while back.
- What the error means
2018-12-23 19:19:36 ERROR dcos_e2e._common | docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
ERROR:dcos_e2e._common:docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
2018-12-23 19:19:36 ERROR dcos_e2e._common | See 'docker run --help'.
ERROR:dcos_e2e._common:See 'docker run --help'.
While folks here have been inspecting their local Docker instances, the error refers to Docker on the minidcos
nodes (Docker in Docker).
-
The requirement to specify
--docker-version
In #1574 (released in 2019.06.07.0
), I made the change "Changed the default version of Docker installed on minidcos docker clusters to 18.06.3-ce
.".
This should mean that folks no longer have to specify --docker-version
by default in the newest minidcos
.
- Next steps
What seems clear is that using Docker version 1.13.1 on nodes is problematic on some machines.
The main pain should be taken away with the update to make 18.06.3-ce
the default version.
However, the doctor command will still fail.
Ideally we can narrow down exactly what the problem is, and make that clear in the doctor
error.
An intermediate step might look for the given error and just move on with a warning, potentially one which links to this issue.