moby
moby copied to clipboard
[BUG] ALL containers which use NFS volumes do not start after reboot
Description
This is the ported Bug-Report from: https://github.com/docker/compose/issues/11354
I am currently facing an issue with my Intel NUC running Debian SID that has persisted for about a year. Despite trying various solutions, I have been unable to resolve it satisfactorily.
My configuration is as follows:
NUC ==(NFS - docker-compose volume)==> SYNO
I run numerous containers within my docker-compose stack, all of which are set to restart with the restart unless-stopped
policy. However, upon system reboot, all containers with NFS volumes mapped fail to start automatically. They remain inactive unless manually initiated. Interestingly, initiating a manual start or restart at any other time works seamlessly, and everything functions as expected.
I anticipate that all containers should initiate during a system reboot, and suspect there may be an underlying hidden race condition that eludes my detection.
Also it seems this is the very same issue as the issues mentioned here:
Reproduce
- use docker-compose (or
docker cli
as proven in the old Bug-Report https://github.com/docker/compose/issues/11354#issuecomment-1902235822) - configure any container
- configure a NFS Volume like this:
volumes:
share:
name: share
driver_opts:
type: "nfs"
o: "addr=192.168.178.2,nfsvers=4"
device: ":/volume1/NFS_SHARE/"
- use the named NFS-Volume in the configured container
- do the restart test (
docker restart container_name
) after restart (docker ps
) - do the reboot test (
reboot
) after reboots (docker ps
)
Expected behavior
There shall not be any race-condition and all containers (also the ones having NFS-based volumes mounted to) shall restart.
docker version
Client: Docker Engine - Community
Version: 25.0.0
API version: 1.44
Go version: go1.21.6
Git commit: e758fe5
Built: Thu Jan 18 17:09:59 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 25.0.0
API version: 1.44 (minimum version 1.24)
Go version: go1.21.6
Git commit: 615dfdf
Built: Thu Jan 18 17:09:59 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.27
GitCommit: a1496014c916f9e62104b33d1bb5bd03b0858e59
runc:
Version: 1.1.11
GitCommit: v1.1.11-0-g4bccb38
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 25.0.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.24.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 5
Server Version: 25.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: a1496014c916f9e62104b33d1bb5bd03b0858e59
runc version: v1.1.11-0-g4bccb38
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.6.11-amd64
Operating System: Debian GNU/Linux trixie/sid
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 30.88GiB
Name: h0tmann
ID: 670157fc-30a9-4aa4-807c-4fa28aec7ec7
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional Info
Are these NFS devices available before the docker service is started? I wonder if the systemd unit needs a custom "After" condition added 🤔
Yes the SYNO runs 24/7 and is not rebooted at all.
Thats what I read somewhere else aswell. It waits for network, but somehow there is a race-condition when it comes to:
- Network itself
- NFS mount actually connected
But in this regards, I am not an expert, so do not qoute me :)
Hm, right, so looking at the default systemd unit for the docker service; https://github.com/moby/moby/blob/5a3a101af2ff6fae24605107b1fbcf53fbb5c38e/contrib/init/systemd/docker.service#L4-L5
It currently waits for;
-
network-online.target
(networking) -
docker.socket
(the socket-activation service) -
firewalld.service
(firewalld) -
containerd.service
(containerd) -
time-set.target
(clock set)
The containerd service already has a local-fs.target
to make sure local filesystems are mounted; https://github.com/containerd/containerd/blob/b66830ff6f8375ce8c7a583eaa03549eaa6707c4/containerd.service#L18
Which means that (because the docker service has After=containerd.service
), local filesystems at least should be mounted.
I think what's needed in your setup is to have the remote-fs.target
;
remote-fs.target
Similar to
local-fs.target
, but for remote mount points.systemd automatically adds dependencies of type After= for this target unit to all SysV init script service units with an LSB header referring to the "$remote_fs" facility.
Given that remove filesystems are not something that's used by default by the Docker Engine, I don't think we should add this to the default systemd unit; doing so likely would delay startup of the service, which would be a regression for setups that don't use remove filesystems, but are running on a system that does have them (but perhaps can be discussed).
To add that target, you can use systemctl edit docker.service
. This will create an override file that allows you to extend or override properties of the default systemd unit (Flatcar has a great page on describing this in more depth);
sudo systemctl edit docker.service
That command will create a systemd "override" (or "drop-in") file, and open it in your default editor. You can add your overrides in the file, and save it. By default, the After
you specify in your override file is appended to the existing list of options in the After
of the default systemd unit (which should not be edited).
## Editing /etc/systemd/system/docker.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Unit]
After=remote-fs.target
### Lines below this comment will be discarded
### /lib/systemd/system/docker.service
# [Unit]
# ...
After you edited and saved, you need to reload systemd to make it re-read the configuration;
sudo systemctl daemon-reload
You can check the new settings using systemctl show
, which should now show the remote-fs.target
included;
sudo systemctl show docker.service | grep ^After
After=containerd.service systemd-journald.socket docker.socket sysinit.target time-set.target network-online.target remote-fs.target system.slice firewalld.service basic.target
Thanks for the detailed explanation.
I have followed your commands. Here is the check-command:
$ systemctl show docker.service | grep ^After
After=network-online.target basic.target sysinit.target firewalld.service docker.socket system.slice time-set.target containerd.service remote-fs.target systemd-journald.socket
I also reloaded the systemd daemon (this should automatically be done by reboot - but did it manually anyway) and rebooted the server.
Again all containers with NFS Volumes are down. They do not start up again.
Given that remove filesystems are not something that's used by default by the Docker Engine, I don't think we should add this to the default systemd unit; doing so likely would delay startup of the service, which would be a regression for setups that don't use remove filesystems, but are running on a system that does have them (but perhaps can be discussed).
Is there a possibility to add this just if NFS (any remoteFS) is getting used anywhere in any docker container?
But like mentioned above, this apparently did not fix the issue. Thanks for your help :)
But like mentioned above, this apparently did not fix the issue.
😢 that's a shame; thanks for trying! I was hoping this would make sure that those remote filesystem mounts were up-and-running.
Possibly it requires a stronger dependency defined; more than After
🤔
Reading the documentation for After
https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#Before=
Note that those settings are independent of and orthogonal to the requirement dependencies as configured by
Requires=
,Wants=
,Requisite=
, orBindsTo=
.It is a common pattern to include a unit name in both the
After=
andWants=
options, in which case the unit listed will be started before the unit that is configured with these options.
Perhaps that second line applies here; might be worth trying if adding remote-fs.service
to Wants
helps 🤔
I did the following:
-
systemctl edit docker.service
- added
remote-fs.service
also toWants
:
### Editing /etc/systemd/system/docker.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file
[Unit]
After=remote-fs.target
Wants=remote-fs.target
### Edits below this comment will be discarded
-
systemctl daemon-reload
-
systemctl show docker.service | grep ^After
:
After=docker.socket network-online.target system.slice sysinit.target containerd.service systemd-journald.socket remote-fs.target basic.target firewalld.service time-set.target
-
systemctl show docker.service | grep ^Wants
Wants=network-online.target remote-fs.target containerd.service
-
reboot
Still - the containers with NFS Volumes do not start automatically.
@thaJeztah are there any news, or is there a specific user to tag on this one?
Thanks in advance! :)
Is there any related error message in dockerd log?
sudo journalctl -e --no-pager -u docker -g ' error while mounting volume '
# or if the above doesn't yield anything useful, try searching for the NFS server address you use
sudo journalctl -e --no-pager -u docker -g '192.168.178.2'
@vvoland thanks - I will reply, once I am at home and executed the commands.
@vvoland thanks, the search for the IP itself returned something:
Jan 25 18:41:19 hostname dockerd[705]: time="2024-01-25T18:41:19.637798586+01:00" level=error msg="failed to start container" container=822210342a705a345accd6bfa16b69507b832bf01aec77ea4439f4b6d375c390 error="error while mounting volume '/var/lib/docker/volumes/share/_data': failed to mount local volume: mount :/volume1/NFS_SHARE/:/var/lib/docker/volumes/share/_data, data: addr=192.168.178.2,nfsvers=4,hard,timeo=600,retrans=3: network is unreachable"
basically network is unreachable
, but yet it is a standard debian installation, following the official docs: https://docs.docker.com/engine/install/debian/
and this is my systemctl file:
# /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service time-set.target
Wants=network-online.target containerd.service
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutStartSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/docker.service.d/override.conf
[Unit]
After=remote-fs.target
Wants=remote-fs.target
Hope this helps debugging it :)
Just wanted to add:
the very same also happens when using SMB
/CIFS
mounts/volumes.
I assume this applies to all network-based mounts/volumes.
I also confirmed the very same on another server. Just to make sure, it is not because of any special config on my end.
Hm... so I just realised that my suggestion of using systemd for this would work if the host had NFS mounts for these filesystems, but if the host does not have those, systemd would not be aware of them, so won't take them into account.
Does this work if your host has a mount from these server(s)? (also see https://geraldonit.com/2023/02/25/auto-mount-nfs-share-using-systemd/)
In that case, it's also worth considering setting up the NFS mount on the host, and instead of using an NFS mount for the container;
driver_opts:
type: "nfs"
o: "addr=192.168.178.2,nfsvers=4"
device: ":/volume1/NFS_SHARE/"
To use a "bind" mount, using the NFS mounted path from the host. This could be a regular bind-mount, or a volume with the relevant mount-options set https://github.com/moby/moby/issues/19990#issuecomment-248955005, something like;
driver_opts:
type: "none"
o: "bind"
device: "/path/to/nfs-mount/on-host/"
Does this work if your host has a mount from these server(s)?
Sorry I don't understand this question.
But here something I have tried before and it worked:
- setting up NFS-Mount to a volume
/mnt/NFS_MOUNT/
(with/etc/fstab
) - mapping it into the container just like any other folder.
This works, but this is not what I desire, since I want the mount and the whole connection also be transferable via docker compose etc.
I feel like this volumes in docker do have a general/structural problem of not waiting for the mount to be active. Can btw anyone of you confirm AND replicate this bug on your side?
Same problem here. In my case, Proxmox with a Debian VM (docker/portainer) connecting to a Synology NAS over NFSv4. After reboot, 8 of 30 containers fail to start and unsurprisingly they're all the ones with NFS mounts. The containers spin right up when I click Start in portainer though.
So far I've tried setting different restart: settings and depends-on:, neither of which is working. I really don't want to touch the host, much rather get it working in Docker alone.
Still scouring the internet for a solution :)
@thaJeztah @vvoland are there any news on this, or is there something I can do to help?
This problem still exists till this day.
We're already depending on the network-online.target
systemd target for the daemon to start.
If that doesn't work out of the box on your system, you might need to adjust your network-online
conditions to suit your configuration:
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Note that:
network-online.target will ensure that all configured network devices are up and have an IP address assigned before the service is started. ... The right "wait" service must be enabled too (NetworkManager-wait-online.service if NetworkManager is used to configure the network, systemd-networkd-wait-online.service if systemd-networkd is used, etc.)