moby icon indicating copy to clipboard operation
moby copied to clipboard

Fedora Docker-CE-Engine 20.10.13 consumes all available system memory (kernel 5.16.13)

Open kevin0x90 opened this issue 2 years ago • 22 comments

Description The issue occured on my Fedora version: Fedora release 35 Kernel Information: Linux 5.16.12-200.fc35.x86_64 #1 SMP PREEMPT Wed Mar 2 19:06:17 UTC 2022

When starting a docker-compose project with mysql with Docker-CE-Engine Version 20.10.13 it consumes all available system memory. With version 20.10.10 the issue is non existing and the docker-compose project requires only ~2GB of RAM.

Steps to reproduce the issue:

  1. setup a docker-compose project including mysql:5.6
  2. run the project with docker-compose
  3. monitor memory usage with for example activity monitor

Describe the results you received: All available system memory is consumed and the system stops working at some point.

Describe the results you expected: I would expected around the same memory consumption as with the old working Version 20.10.10

Additional information you deem important (e.g. issue happens only occasionally): The issue was reproducible and only a downgrade to 20.10.10 could solve the issue.

Version info where the issue occured: Client: Docker Engine - Community Version: 20.10.13 API version: 1.41 Go version: go1.16.15 Git commit: a224086 Built: Thu Mar 10 14:08:18 2022 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.13 API version: 1.41 (minimum version 1.12) Go version: go1.16.15 Git commit: 906f57f Built: Thu Mar 10 14:06:06 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.5.10 GitCommit: 2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc runc: Version: 1.0.3 GitCommit: v1.0.3-0-gf46b6ba docker-init: Version: 0.19.0 GitCommit: de40ad0

Additional environment details (AWS, VirtualBox, physical, etc.): physical hardware.

kevin0x90 avatar Mar 11 '22 15:03 kevin0x90

I got the same issue with mysql:5.7 image & 20.10.13 version, memory consumption is very high so system is swaping and the startup sequence is extremely slow. Can easily be reproduced with a docker run -i mysql:5.7

It seems that the problem doesn't seem to exist with mysql:8.0 and, as a matter of fact, everything worked with previous docker version.

vaceletm avatar Mar 14 '22 09:03 vaceletm

I downgraded to 20.10.12, 20.10.11 and 20.10.10 (the 3 last versions available in the official repo) and I still hit the same issue. That's maybe a kernel issue.

vaceletm avatar Mar 14 '22 11:03 vaceletm

Thanks for reporting; so to reproduce the issue, just a docker run -i mysql:5.7 (no other options) is sufficient?

If that's the case, that's odd indeed. As a workaround to prevent the system from running out of memory, you could of course add memory constraints to the container itself (but that wouldn't fix the underlying issue, just possibly prevent it from consuming all memory).

thaJeztah avatar Mar 14 '22 12:03 thaJeztah

Could you perhaps also add the output of docker info ? (that contains additional information, such as kernel version, storage driver etc); of course feel free to redact information where needed.

thaJeztah avatar Mar 14 '22 12:03 thaJeztah

Thanks for reporting; so to reproduce the issue, just a docker run -i mysql:5.7 (no other options) is sufficient?

Yes, it's as simple as that. The output will be the following for a while (while eating all the RAM) and eventually the init will continue.

2022-03-14 12:18:46+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.37-1debian10 started.

As a workaround to prevent the system from running out of memory, you could of course add memory constraints to the container itself

Actually, setting a memory constraint makes mysql init fail:

docker run --memory 1073741824 -i mysql:5.7
2022-03-14 12:25:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.37-1debian10 started.
2022-03-14 12:25:16+00:00 [ERROR] [Entrypoint]: mysqld failed while attempting to check config
	command was: mysqld --verbose --help --log-bin-index=/tmp/tmp.sz9LdwWe78

(Same command works with mysql:8.0)

docker info, here it is:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.0-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 12
  Running: 0
  Paused: 0
  Stopped: 12
 Images: 80
 Server Version: 20.10.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
 runc version: v1.0.3-0-gf46b6ba
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.14.10-300.fc35.x86_64
 Operating System: Fedora Linux 35 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.39GiB
 Name: localhost.localdomain
 ID: BVZQ:2MR3:XMZ6:OCVR:RHF2:SLKM:UIVC:KELR:PYSI:PW7R:2GX5:D3FB
 Docker Root Dir: /home/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

I switched back to older kernel version (5.14.10 here, identified the issue with 5.16.13)

vaceletm avatar Mar 14 '22 12:03 vaceletm

FTR, @LeSuisse identified that a rollback to containerd.io-1.4.13-3.1.fc35 solves the problem

vaceletm avatar Mar 14 '22 14:03 vaceletm

Using a different kernels (5.14.18-300.fc35, 5.16.14-200.fc35) and Docker CE Engines (20.10.12, 20.10.11, 20.10.10) did not resolve the issue for me. Only downgrading to containerd.io-1.4.13-3.1.fc35 resolved the memory leak.

Here's my docker info output of the stable setup:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.0-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 12
  Running: 12
  Paused: 0
  Stopped: 0
 Images: 146
 Server Version: 20.10.13
 Storage Driver: btrfs
  Build Version: Btrfs v5.16.2 
  Library Version: 102
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cc61520f4cd876b86e77edfeb88fbcd536d1f9d
 runc version: v1.0.3-0-gf46b6ba
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.16.13-200.fc35.x86_64
 Operating System: Fedora Linux 35 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 23.23GiB
 Name: localhost.localdomain
 ID: [REDACTED]
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

kaittodesk avatar Mar 16 '22 08:03 kaittodesk

Are you seeing the same happening if you run the container through containerd?

Something like;

ctr image pull docker.io/library/mysql:5.7

ctr run --env MYSQL_ALLOW_EMPTY_PASSWORD=1 -t docker.io/library/mysql:5.7 mycontainer

thaJeztah avatar Mar 17 '22 17:03 thaJeztah

Running container through containerd (both 1.4.13-3.1.fc35 and 1.5.10-3.1.fc35) does not create the memory leak.

However in order to run the container I had to do some mounting trickery (hopefully it does not yield into an apples to oranges comparison):

cd /home/kait
mkdir run
chmod 777 run
ctr run --rm --mount "type=bind,src=/home/kait/run,dst=/var/run/mysqld,options=rbind:rw" --env MYSQL_ALLOW_EMPTY_PASSWORD=1 docker.io/library/mysql:5.7 mycontainer

Otherwise the container initialization would fail with error:

2022-03-18T07:28:20.058885Z 0 [ERROR] Could not create unix socket lock file /var/run/mysqld/mysqld.sock.lock.
2022-03-18T07:28:20.058893Z 0 [ERROR] Unable to setup unix socket lock file.
2022-03-18T07:28:20.058897Z 0 [ERROR] Aborting

And the server would shut down.

kaittodesk avatar Mar 18 '22 07:03 kaittodesk

Is there anything we can do here to make it move forward ? Should we report the issue to fedora as well ?

vaceletm avatar Mar 28 '22 09:03 vaceletm

Is there any update about this?

kevin0x90 avatar May 07 '22 10:05 kevin0x90

Small update for the people using fedora with the upgrade to Fedora 36 there is no way to downgrade containerd. Just learned this the hard way after upgrading and the bug still existing 😅.

kevin0x90 avatar May 10 '22 17:05 kevin0x90

Maybe interesting for those who also upgraded already to fedora 36 i found a way to still downgrade to the working versions by specifying the fedora release version in dnf:

#!/bin/bash
sudo dnf --releasever=35 downgrade docker-ce-3:20.10.10 docker-ce-cli-3:20.10.10 containerd.io-1.4.13

kevin0x90 avatar May 10 '22 18:05 kevin0x90

Hello,

I've upgraded my workstation to Fedora 36 with the last versions of containerd.io and docker-ce and the issue is still here. Only the downgrade suggested by @kevin0x90 seems to provide a running MySQL container without consuming all the memory.

How can we help you to solve this issue?

yannis-rossetto avatar Aug 09 '22 08:08 yannis-rossetto

Hello,

I've upgraded my workstation to Fedora 36 with the last versions of containerd.io and docker-ce and the issue is still here. Only the downgrade suggested by @kevin0x90 seems to provide a running MySQL container without consuming all the memory.

How can we help you to solve this issue?

This how it works for me on Fedora 36:

Downgrade containerd.io as @kevin0x90 wrote:

sudo dnf --releasever=35 downgrade docker-ce-3:20.10.10 docker-ce-cli-3:20.10.10 containerd.io-1.4.13

Then freeze containerd.io version to prevent further upgrading:

sudo dnf install 'dnf-command(versionlock)'
sudo dnf versionlock containerd.io-1.4.13

pprishchepa avatar Aug 10 '22 08:08 pprishchepa

Maybe some good to know addition to the versionlock is that if you use gnome software for updates it will ignore the versionlock in dnf https://bugzilla.redhat.com/show_bug.cgi?id=1671489 I just stumbled about this recently.

kevin0x90 avatar Aug 21 '22 09:08 kevin0x90

For the record, switching from docker-ce to moby & all provided by fedora solved the issue for me

vaceletm avatar Aug 23 '22 07:08 vaceletm

@vaceletm could show a direction to dig about switching from docker-ce to mody?

pprishchepa avatar Aug 23 '22 07:08 pprishchepa

Here is the full script of what I had to do, some of the change might be related to composer v2 switch (builtkit by default but I didn't track down everything):

$> dnf install moby-engine --allowerasing
$> sudo systemctl edit docker
[Service]
LimitNOFILE=1024
$> sudo systemctl daemon-reload
$> sudo setenforce disabled
$> vim /etc/selinux/config
SELINUX=permissive
$> sudo systemctl restart docker

Be careful: with this approach you disable selinux on your platform, you might be at risk then. Evaluate the consequences beforehand.

vaceletm avatar Aug 23 '22 07:08 vaceletm

Just came across this issue yesterday. Figure I'd provide additional info and which solution worked best for me.

Kernel: 5.18.18-200.fc36.x86_64 Docker version:

Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:03:59 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true
Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:39 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.7
  GitCommit:        0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
 runc:
  Version:          1.1.3
  GitCommit:        v1.1.3-0-g6724737
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker run --rm apache/airflow:2.3.3-python3.9 scheduler worked fine. docker run --rm apache/airflow:2.3.4-python3.9 scheduler ate up memory.

I tried uninstalling docker-ce's docker-engine and installing Fedora's moby-engine, which worked, but ran into SELinux issues as mentioned above.

What works decently well for me is Docker Desktop for Linux. I just enable "Start Docker Desktop when you log in" (and change other settings...), and then change the CLI's context via:

docker context ls
docker context use desktop-linux

What's nice is that you can run docker commands without sudo.

Other than running into UID-related issues, things seem to be working fine.

Client: Docker Engine - Community
 Cloud integration: v1.0.28
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:03:59 2022
 OS/Arch:           linux/amd64
 Context:           desktop-linux
 Experimental:      true

Server: Docker Desktop 4.11.1 (84025)
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:23 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

krnhotwings avatar Aug 25 '22 18:08 krnhotwings

See: https://github.com/containerd/containerd/pull/7566#issuecomment-1285417325

sam-thibault avatar Dec 08 '22 11:12 sam-thibault

Hi. I have Fedora-36 and solved this issue by changing in /usr/lib/systemd/system/containerd.service LimitNOFILE=infinity to LimitNOFILE=1048576. Reboot and all works.

vyeve avatar Mar 03 '23 21:03 vyeve

Looks like this is effectively a duplicate of / covered by https://github.com/moby/moby/issues/38814, and will be addressed by https://github.com/moby/moby/pull/45534

thaJeztah avatar Jun 08 '23 14:06 thaJeztah