for-linux
for-linux copied to clipboard
Processes are running slow in docker container on ubuntu 18.04
- [x] This is a bug report
- [ ] This is a feature request
- [x] I searched existing issues before opening this one
Processes are taking time when running them in a docker container on ubuntu 18 machine. But the same process with the same docker version is running fine on ubuntu 16 machine.
I have a node application listening on some port. Accepting get requests on the path "/" and "/docker" which simply runs a command "whoami" in the host machine and in docker container respectively and returns the result. The same node application with the same docker container is running on both the machines (ubuntu16 and ubuntu18).
Now, First I tried sending 20 concurrent get request with path "/" to both the machines. And both the machines executed the command in avg of 35-40ms. With this, I concluded that the spawning process without docker has the same behaviour.
Now, Second I tried sending 20 concurrent get request with path "/docker" to both the machines. Here, ubuntu16 machine took Max 4.3 seconds and Avg 3 seconds. But ubuntu18 machine taking Max 10seconds and Avg 9 seconds.
I tried the above test multiple times. and concluded when running processes concurrently inside docker the time taken to execute is almost double in ubuntu18 machine as compare to ubuntu16.
Output of docker version
:
Client:
Version: 18.09.8
API version: 1.39
Go version: go1.10.8
Git commit: 0dd43dd87f
Built: Wed Jul 17 17:40:56 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.8
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 0dd43dd
Built: Wed Jul 17 17:07:25 2019
OS/Arch: linux/amd64
Experimental: false
Output of docker info
which are common to both machine:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 2
Server Version: 18.09.8
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.296GiB
Name: myhostname
ID: LLLO:OMTS:PNNM:T3MP:AD2F:UMDG:IIZK:OGBO:3ZLL:YDBX:ONAO:AY5G
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 27
Goroutines: 42
System Time: 2019-07-25T15:25:54.991694211+05:30
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Output of docker info
which specific to ubuntu16:
Kernel Version: 4.4.0-112-generic
Operating System: Ubuntu 16.04.3 LTS
Total Memory: 7.303GiB
ID: FOFI:RW7N:RZSP:HHKH:BKS3:LMWL:TC2J:W7V2:222Y:Q2AU:XMU3:KLU7
Output of docker info
which specific to ubuntu18:
Kernel Version: 4.15.0-1040-aws
Operating System: Ubuntu 18.04.2 LTS
Total Memory: 7.296GiB
ID: LLLO:OMTS:PNNM:T3MP:AD2F:UMDG:IIZK:OGBO:3ZLL:YDBX:ONAO:AY5G
Additional environment details (AWS, VirtualBox, physical, etc.)
Machine1: Giving better performance.
node v10.16.0
docker 18.09.8
ubuntu 16.04.3 LTS, xenial
AWS instance type c4.xlarge
Machine2: Giving poor performance.
node v10.16.0
docker 18.09.8
ubuntu 18.04.2 LTS, bionic
AWS instance type c4.xlarge
docker image
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
aa0a31cd86a8 ubuntu:18.04 "bash" 2 days ago Up 25 hours ubuntu
The command I am running in docker
docker exec ubuntu whoami
ubuntu18 machine Data:
1. Data of time taken in execution
2019-07-25 14:07:32.559 INFO uid: 715a6af0-aeb7-11e9-a5a9-2fffd4e800d1 time: 1008 result: {"success":true,"data":"root"}
2019-07-25 14:07:32.941 INFO uid: 7178c860-aeb7-11e9-a5a9-2fffd4e800d1 time: 1191 result: {"success":true,"data":"root"}
2019-07-25 14:07:40.363 INFO uid: 71767e70-aeb7-11e9-a5a9-2fffd4e800d1 time: 8628 result: {"success":true,"data":"root"}
.
.
.
2019-07-25 14:07:41.970 INFO uid: 718af0d0-aeb7-11e9-a5a9-2fffd4e800d1 time: 10101 result: {"success":true,"data":"root"}
2. logs of command strace -t docker exec ubuntu whoami
3. result of perf top --sort comm,dso
So, I need help in debugging what is wrong with docker on the ubuntu18 machine. Or if there is any limitation with docker on ubuntu18 or maybe some machine or docker configuration issue.
Aayush did you find anything? We are starting to see something similar on latest Ubuntu 16 update with some 4.15 kernel and updated packages (could be same as ubuntu 18)
Seems to be something in the docker engine causing this, not runc or containerd.
Test with Docker:
(for i in `seq 1 10`; do (time sudo docker exec test true &); done)
michael|~/development/gocode/src/github.com/containerd/containerd >
real 0m0.241s
user 0m0.031s
sys 0m0.008s
real 0m0.435s
user 0m0.026s
sys 0m0.012s
========> LONG PAUSE HERE BEFORE REST OF OUTPUT
real 0m1.887s
user 0m0.034s
sys 0m0.011s
real 0m1.907s
user 0m0.030s
sys 0m0.013s
real 0m1.923s
user 0m0.030s
sys 0m0.014s
real 0m1.940s
user 0m0.031s
sys 0m0.008s
real 0m1.961s
user 0m0.026s
sys 0m0.015s
real 0m1.987s
user 0m0.029s
sys 0m0.011s
real 0m2.003s
user 0m0.032s
sys 0m0.009s
real 0m2.025s
user 0m0.030s
sys 0m0.006s
With Containerd:
(for i in `seq 1 10`; do (time sudo ctr t exec --exec-id $i sh true & ); done)
michael|~/development/gocode/src/github.com/containerd/containerd >
real 0m0.329s
user 0m0.015s
sys 0m0.019s
real 0m0.327s
user 0m0.011s
sys 0m0.018s
real 0m0.336s
user 0m0.011s
sys 0m0.017s
real 0m0.337s
user 0m0.012s
sys 0m0.018s
real 0m0.350s
user 0m0.019s
sys 0m0.011s
real 0m0.354s
user 0m0.010s
sys 0m0.017s
real 0m0.357s
user 0m0.014s
sys 0m0.014s
real 0m0.365s
user 0m0.012s
sys 0m0.015s
real 0m0.377s
user 0m0.018s
sys 0m0.015s
real 0m0.372s
user 0m0.015s
sys 0m0.012s
Hello @mbdas, I found that when I installed docker via binary. That is if I install docker from the static binaries as mentioned in this link I started getting better performance as one in ubuntu 16.
But when I am installing the docker through apt-get
command as mentioned here, then I am getting poor performance. I tried this test with docker 18 and 19 both but found the same behaviour.
Still, I did not get any clue for whats is the difference between installation via apt-get vs installation via static binaries.
An engineer in the team found the issue. libseccomp2 was causing this. Once we downgraded the lib, it was back to normal. docker needs to look at what changed in this lib.
I do have similar issue. Ubuntu 19.04. Docker version 19.03.2. WordPress container runs slow. Found out that disabling WordFence helped. Yet still funny, cause on production it works good. Damn strange :-)
@mbdas What is the version you have installed?
2.2.3-3ubuntu3 . We have pinned this version for now.
I now see a case when a simple "docker build" with a Dockerfile containing "apt-get install" for a dozen packages runs for 20 minutes before hanging completely. Just reducing that to:
FROM ubuntu:18.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get -y install shared-mime-info
Leads to:
Setting up libicu60:amd64 (60.2-3ubuntu3.1) ...
Setting up libglib2.0-0:amd64 (2.56.4-0ubuntu0.18.04.6) ...
No schema files found: doing nothing.
Setting up libxml2:amd64 (2.9.4+dfsg1-6.1ubuntu1.3) ...
Setting up libglib2.0-data (2.56.4-0ubuntu0.18.04.6) ...
Setting up shared-mime-info (1.9-2) ...
^C
real 4m39.560s
user 0m0.136s
sys 0m0.065s
Same problem with Ubuntu 18 LTS (docker 19.03.8 bionic) and also after it's been updated to 20 LTS (docker 19.03.8 eoan). Simply launching mysql:8 takes ~40 seconds whereas the machine is a monster and totally not loaded.
This only specifics about that server is it's got a mirror raid with mechanical disks. I didn't have this issue on the same machine one year ago.
I tried to tweak the scheduler, without success.
After I found this issue and this thread, I tried with seccomp:unconfined
to no avail. Not sure what to do now.
I'm seeing the same thing in PopOS 20.04 as well - building C++ project with ninja doesn't utilize all the cores like it did when running in Ubuntu 18.04. My current docker version is 19.03.11, build 42e35e61f3
I now see a case when a simple "docker build" with a Dockerfile containing "apt-get install" for a dozen packages runs for 20 minutes before hanging completely.
For reference, this slowdown in my case vanished in the same non-deterministic way as they appeared. My take-away is that Docker performance re: image building can just be non-deterministic, and not every time it's due to something was done on your side (or rather, it's not the case that you can just do something specific on your side to make it go away).
fwiw, I don't think it's libseccomp.so
. The ubuntu host that runs 18.04 has libseccomp.so.2.4.1
, but it runs fine. My PopOS 20.04 has libseccomp.so.2.4.3
, but is very slow.
Has anyone found any solutions? I'm running PopOS 20.04 and docker containers run 30% slower than they should. When running a python app test suite in docker my machine is 30% slower than my colleague with same CPU running Windows/Ubuntu 18.04 in WSL. I could accept a small difference, actually I'd expect PopOS to be faster, but 30% difference doesnt quite make sense.
I gave up and changed servers
i can confirm that running docker on PopOS 20.04 is around 30% slower than it should. @alexphelps did you found a solution?
I am fairly certain this is related to libseccomp. Can you try downgrading to libseccomp 2.3?
Relevant seccomp issue: https://github.com/seccomp/libseccomp/issues/153
You could also try running with --security-opt seccomp:unconfined
Can confirm the massive loss of performance:
# time docker run -it --rm perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
real 0m14,809s
user 0m0,019s
sys 0m0,009s
With --security-opt seccomp:unconfined
:
# time docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
real 0m5,746s
user 0m0,015s
sys 0m0,015s
On the Host itself:
# time perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
real 0m5,327s
user 0m5,322s
sys 0m0,001s
Seeing the same ~3X speedup with seccomp:unconfined..
As noted before in the thread, only way out is downgrade libseccomp to a pretty old version and make it work and ensure it remains pinned and no OS update replaces it. I am curious if containerd (now that K8s directly uses it) or podman (from redhat end) which supports seccomp, whether they use a different lib to implement seccomp profile and hence no issue in those setups. Clearly no one in docker community is interested to resolve this.
@mbdas As I understand it libseccomp 2.5 addresses these performance issues.
I'd be happy to try upgrading to confirm the fix. Any guidance on how? (apt-get? fresh docker install at some version? basic googling did not make this obvious)
I use libseccomp2 (2.5.1-1ubuntu1~18.04.1)
and see the same ratio when comparing the two commands suggested above.
I just updated our CI server to ubuntu 20.04.2 and have noticed the same problem. A lot of our CI tests start to fail due to timeout. I checked the libseccomps version is 2.5.1-1ubuntu1~20.04.1.
I'm also able to reproduce the problem illustrated by the two commands above. Here's my setup: libseccomp2 (2.5.1-1ubuntu1~20.04.1 amd64) Ubuntu 20.04.2 LTS Docker version 20.10.6, build 370c289
Please make sure you are using runc version rc92 and not rc93 (latest).
I'm not sure how I'd be able to do that considering rc92 is not an available version in apt for me, but it looks like rc94 is coming out very soon with the following, so thanks for the hint.
Having upgraded to the latest version of runc, I'm still seeing a big discrepancy in performance between the two commands @alexgit2k gave us. The ratio of the time taken for the two commands is smaller but the time taken to run without the --security-opt seccomp:unconfined
is still more than double the time it takes to run with it.
# runc --version
runc version 1.0.0-rc94
spec: 1.0.2-dev
go: go1.14.15
libseccomp: 2.5.1
# time docker run -it --rm perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
docker run -it --rm perl /usr/local/bin/perl -e 0.02s user 0.02s system 0% cpu 18.732 total
# time docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/per 0.03s user 0.01s system 0% cpu 7.732 total
This isn't limited to docker and I'm seeing it with the lastest containerd (1.4.6 and 1.5.2) and libseccomp (2.5.1):
https://github.com/moby/moby/issues/41389
I've done some rough testing on a few different systems and it appears there is a 15%-50% hit in performance prior to kernel 5.11.
Upgrading your kernel may help if that is an option for you.