for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

Processes are running slow in docker container on ubuntu 18.04

Open Aayush04 opened this issue 5 years ago • 36 comments

  • [x] This is a bug report
  • [ ] This is a feature request
  • [x] I searched existing issues before opening this one

Processes are taking time when running them in a docker container on ubuntu 18 machine. But the same process with the same docker version is running fine on ubuntu 16 machine.

I have a node application listening on some port. Accepting get requests on the path "/" and "/docker" which simply runs a command "whoami" in the host machine and in docker container respectively and returns the result. The same node application with the same docker container is running on both the machines (ubuntu16 and ubuntu18).

Now, First I tried sending 20 concurrent get request with path "/" to both the machines. And both the machines executed the command in avg of 35-40ms. With this, I concluded that the spawning process without docker has the same behaviour.

Now, Second I tried sending 20 concurrent get request with path "/docker" to both the machines. Here, ubuntu16 machine took Max 4.3 seconds and Avg 3 seconds. But ubuntu18 machine taking Max 10seconds and Avg 9 seconds.

I tried the above test multiple times. and concluded when running processes concurrently inside docker the time taken to execute is almost double in ubuntu18 machine as compare to ubuntu16.

Output of docker version:

Client:
 Version:           18.09.8
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        0dd43dd87f
 Built:             Wed Jul 17 17:40:56 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.8
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       0dd43dd
  Built:            Wed Jul 17 17:07:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info which are common to both machine:

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 18.09.8
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.296GiB
Name: myhostname
ID: LLLO:OMTS:PNNM:T3MP:AD2F:UMDG:IIZK:OGBO:3ZLL:YDBX:ONAO:AY5G
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 27
 Goroutines: 42
 System Time: 2019-07-25T15:25:54.991694211+05:30
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Output of docker info which specific to ubuntu16:

Kernel Version: 4.4.0-112-generic
Operating System: Ubuntu 16.04.3 LTS
Total Memory: 7.303GiB
ID: FOFI:RW7N:RZSP:HHKH:BKS3:LMWL:TC2J:W7V2:222Y:Q2AU:XMU3:KLU7

Output of docker info which specific to ubuntu18:

Kernel Version: 4.15.0-1040-aws
Operating System: Ubuntu 18.04.2 LTS
Total Memory: 7.296GiB
ID: LLLO:OMTS:PNNM:T3MP:AD2F:UMDG:IIZK:OGBO:3ZLL:YDBX:ONAO:AY5G

Additional environment details (AWS, VirtualBox, physical, etc.)

Machine1: Giving better performance.
node    v10.16.0
docker  18.09.8
ubuntu  16.04.3 LTS, xenial
AWS instance type c4.xlarge

Machine2: Giving poor performance.
node    v10.16.0
docker  18.09.8
ubuntu  18.04.2 LTS, bionic
AWS instance type c4.xlarge

docker image

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
aa0a31cd86a8        ubuntu:18.04        "bash"              2 days ago          Up 25 hours                             ubuntu

The command I am running in docker

docker exec ubuntu whoami

ubuntu18 machine Data:

1. Data of time taken in execution

2019-07-25 14:07:32.559 INFO  uid: 715a6af0-aeb7-11e9-a5a9-2fffd4e800d1 time: 1008 result: {"success":true,"data":"root"}
2019-07-25 14:07:32.941 INFO  uid: 7178c860-aeb7-11e9-a5a9-2fffd4e800d1 time: 1191 result: {"success":true,"data":"root"}
2019-07-25 14:07:40.363 INFO  uid: 71767e70-aeb7-11e9-a5a9-2fffd4e800d1 time: 8628 result: {"success":true,"data":"root"}
.
.
.
2019-07-25 14:07:41.970 INFO  uid: 718af0d0-aeb7-11e9-a5a9-2fffd4e800d1 time: 10101 result: {"success":true,"data":"root"}

2. logs of command strace -t docker exec ubuntu whoami Screen Shot 2019-07-25 at 3 14 42 PM

3. result of perf top --sort comm,dso Screen Shot 2019-07-25 at 3 19 47 PM

So, I need help in debugging what is wrong with docker on the ubuntu18 machine. Or if there is any limitation with docker on ubuntu18 or maybe some machine or docker configuration issue.

Aayush04 avatar Jul 25 '19 11:07 Aayush04

Aayush did you find anything? We are starting to see something similar on latest Ubuntu 16 update with some 4.15 kernel and updated packages (could be same as ubuntu 18)

mbdas avatar Aug 19 '19 22:08 mbdas

Seems to be something in the docker engine causing this, not runc or containerd.

Test with Docker:

(for i in `seq 1 10`; do (time sudo docker exec test true &); done)
michael|~/development/gocode/src/github.com/containerd/containerd >
real    0m0.241s
user    0m0.031s
sys     0m0.008s

real    0m0.435s
user    0m0.026s
sys     0m0.012s

========> LONG PAUSE HERE BEFORE REST OF OUTPUT

real    0m1.887s
user    0m0.034s
sys     0m0.011s

real    0m1.907s
user    0m0.030s
sys     0m0.013s

real    0m1.923s
user    0m0.030s
sys     0m0.014s

real    0m1.940s
user    0m0.031s
sys     0m0.008s

real    0m1.961s
user    0m0.026s
sys     0m0.015s

real    0m1.987s
user    0m0.029s
sys     0m0.011s

real    0m2.003s
user    0m0.032s
sys     0m0.009s

real    0m2.025s
user    0m0.030s
sys     0m0.006s

With Containerd:

(for i in `seq 1 10`; do (time sudo ctr t  exec --exec-id $i sh true & ); done)
michael|~/development/gocode/src/github.com/containerd/containerd >
real    0m0.329s
user    0m0.015s
sys     0m0.019s

real    0m0.327s
user    0m0.011s
sys     0m0.018s

real    0m0.336s
user    0m0.011s
sys     0m0.017s

real    0m0.337s
user    0m0.012s
sys     0m0.018s

real    0m0.350s
user    0m0.019s
sys     0m0.011s

real    0m0.354s
user    0m0.010s
sys     0m0.017s

real    0m0.357s
user    0m0.014s
sys     0m0.014s

real    0m0.365s
user    0m0.012s
sys     0m0.015s

real    0m0.377s
user    0m0.018s
sys     0m0.015s

real    0m0.372s
user    0m0.015s
sys     0m0.012s

crosbymichael avatar Aug 20 '19 17:08 crosbymichael

Hello @mbdas, I found that when I installed docker via binary. That is if I install docker from the static binaries as mentioned in this link I started getting better performance as one in ubuntu 16.

But when I am installing the docker through apt-get command as mentioned here, then I am getting poor performance. I tried this test with docker 18 and 19 both but found the same behaviour.

Still, I did not get any clue for whats is the difference between installation via apt-get vs installation via static binaries.

Aayush04 avatar Aug 20 '19 21:08 Aayush04

An engineer in the team found the issue. libseccomp2 was causing this. Once we downgraded the lib, it was back to normal. docker needs to look at what changed in this lib.

mbdas avatar Aug 22 '19 00:08 mbdas

I do have similar issue. Ubuntu 19.04. Docker version 19.03.2. WordPress container runs slow. Found out that disabling WordFence helped. Yet still funny, cause on production it works good. Damn strange :-)

severfire avatar Sep 07 '19 11:09 severfire

@mbdas What is the version you have installed?

alcaamado avatar Nov 13 '19 21:11 alcaamado

2.2.3-3ubuntu3 . We have pinned this version for now.

mbdas avatar Nov 14 '19 01:11 mbdas

I now see a case when a simple "docker build" with a Dockerfile containing "apt-get install" for a dozen packages runs for 20 minutes before hanging completely. Just reducing that to:

FROM ubuntu:18.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get -y install shared-mime-info

Leads to:

Setting up libicu60:amd64 (60.2-3ubuntu3.1) ...
Setting up libglib2.0-0:amd64 (2.56.4-0ubuntu0.18.04.6) ...
No schema files found: doing nothing.
Setting up libxml2:amd64 (2.9.4+dfsg1-6.1ubuntu1.3) ...
Setting up libglib2.0-data (2.56.4-0ubuntu0.18.04.6) ...
Setting up shared-mime-info (1.9-2) ...
^C
real	4m39.560s
user	0m0.136s
sys	0m0.065s

pfalcon avatar Apr 02 '20 15:04 pfalcon

Same problem with Ubuntu 18 LTS (docker 19.03.8 bionic) and also after it's been updated to 20 LTS (docker 19.03.8 eoan). Simply launching mysql:8 takes ~40 seconds whereas the machine is a monster and totally not loaded.

This only specifics about that server is it's got a mirror raid with mechanical disks. I didn't have this issue on the same machine one year ago.

I tried to tweak the scheduler, without success.

After I found this issue and this thread, I tried with seccomp:unconfined to no avail. Not sure what to do now.

fabmars avatar Apr 29 '20 14:04 fabmars

I'm seeing the same thing in PopOS 20.04 as well - building C++ project with ninja doesn't utilize all the cores like it did when running in Ubuntu 18.04. My current docker version is 19.03.11, build 42e35e61f3

hicder avatar Jun 07 '20 12:06 hicder

I now see a case when a simple "docker build" with a Dockerfile containing "apt-get install" for a dozen packages runs for 20 minutes before hanging completely.

For reference, this slowdown in my case vanished in the same non-deterministic way as they appeared. My take-away is that Docker performance re: image building can just be non-deterministic, and not every time it's due to something was done on your side (or rather, it's not the case that you can just do something specific on your side to make it go away).

pfalcon avatar Jun 07 '20 12:06 pfalcon

fwiw, I don't think it's libseccomp.so. The ubuntu host that runs 18.04 has libseccomp.so.2.4.1, but it runs fine. My PopOS 20.04 has libseccomp.so.2.4.3, but is very slow.

hicder avatar Jun 07 '20 13:06 hicder

Has anyone found any solutions? I'm running PopOS 20.04 and docker containers run 30% slower than they should. When running a python app test suite in docker my machine is 30% slower than my colleague with same CPU running Windows/Ubuntu 18.04 in WSL. I could accept a small difference, actually I'd expect PopOS to be faster, but 30% difference doesnt quite make sense.

alexphelps avatar Aug 04 '20 08:08 alexphelps

I gave up and changed servers

fabmars avatar Aug 04 '20 09:08 fabmars

i can confirm that running docker on PopOS 20.04 is around 30% slower than it should. @alexphelps did you found a solution?

timglabisch avatar Oct 16 '20 05:10 timglabisch

I am fairly certain this is related to libseccomp. Can you try downgrading to libseccomp 2.3?

cpuguy83 avatar Oct 16 '20 18:10 cpuguy83

Relevant seccomp issue: https://github.com/seccomp/libseccomp/issues/153

cpuguy83 avatar Oct 16 '20 18:10 cpuguy83

You could also try running with --security-opt seccomp:unconfined

cpuguy83 avatar Oct 16 '20 18:10 cpuguy83

Can confirm the massive loss of performance:

# time docker run -it --rm perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504

real    0m14,809s
user    0m0,019s
sys     0m0,009s

With --security-opt seccomp:unconfined:

# time docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504

real    0m5,746s
user    0m0,015s
sys     0m0,015s

On the Host itself:

# time perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504

real    0m5,327s
user    0m5,322s
sys     0m0,001s

alexgit2k avatar Mar 04 '21 09:03 alexgit2k

Seeing the same ~3X speedup with seccomp:unconfined..

lmeyerov avatar Mar 24 '21 08:03 lmeyerov

As noted before in the thread, only way out is downgrade libseccomp to a pretty old version and make it work and ensure it remains pinned and no OS update replaces it. I am curious if containerd (now that K8s directly uses it) or podman (from redhat end) which supports seccomp, whether they use a different lib to implement seccomp profile and hence no issue in those setups. Clearly no one in docker community is interested to resolve this.

mbdas avatar Mar 24 '21 17:03 mbdas

@mbdas As I understand it libseccomp 2.5 addresses these performance issues.

cpuguy83 avatar Mar 24 '21 18:03 cpuguy83

I'd be happy to try upgrading to confirm the fix. Any guidance on how? (apt-get? fresh docker install at some version? basic googling did not make this obvious)

lmeyerov avatar Mar 24 '21 19:03 lmeyerov

I use libseccomp2 (2.5.1-1ubuntu1~18.04.1) and see the same ratio when comparing the two commands suggested above.

maxime-michel avatar Apr 23 '21 14:04 maxime-michel

I just updated our CI server to ubuntu 20.04.2 and have noticed the same problem. A lot of our CI tests start to fail due to timeout. I checked the libseccomps version is 2.5.1-1ubuntu1~20.04.1.

ediezh avatar Apr 28 '21 04:04 ediezh

I'm also able to reproduce the problem illustrated by the two commands above. Here's my setup: libseccomp2 (2.5.1-1ubuntu1~20.04.1 amd64) Ubuntu 20.04.2 LTS Docker version 20.10.6, build 370c289

edwardcrompton avatar Apr 29 '21 08:04 edwardcrompton

Please make sure you are using runc version rc92 and not rc93 (latest).

cpuguy83 avatar Apr 29 '21 14:04 cpuguy83

I'm not sure how I'd be able to do that considering rc92 is not an available version in apt for me, but it looks like rc94 is coming out very soon with the following, so thanks for the hint.

maxime-michel avatar Apr 30 '21 06:04 maxime-michel

Having upgraded to the latest version of runc, I'm still seeing a big discrepancy in performance between the two commands @alexgit2k gave us. The ratio of the time taken for the two commands is smaller but the time taken to run without the --security-opt seccomp:unconfined is still more than double the time it takes to run with it.

# runc --version
runc version 1.0.0-rc94
spec: 1.0.2-dev
go: go1.14.15
libseccomp: 2.5.1
# time docker run -it --rm perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
docker run -it --rm perl /usr/local/bin/perl -e   0.02s user 0.02s system 0% cpu 18.732 total
# time docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/perl -e 'for($|++,$_++,$a++;;$,+=$a/$_,$_++,$a*=-1,$_++){if($i++>50000000){printf"%.16f\n",$,*4;exit;}}'
3.1415926735902504
docker run -it --rm --security-opt seccomp:unconfined perl /usr/local/bin/per  0.03s user 0.01s system 0% cpu 7.732 total

edwardcrompton avatar May 19 '21 07:05 edwardcrompton

This isn't limited to docker and I'm seeing it with the lastest containerd (1.4.6 and 1.5.2) and libseccomp (2.5.1):

https://github.com/moby/moby/issues/41389

I've done some rough testing on a few different systems and it appears there is a 15%-50% hit in performance prior to kernel 5.11.

Upgrading your kernel may help if that is an option for you.

vaskozl avatar Jun 24 '21 15:06 vaskozl