[BUG] Race condition when populating volumes with nested volumes
Description
We have a customer setup where a volume is bind mounted into a container and the container populates it with some files on first start. Later another volume was added which is a subdirectory inside the first volume. This subdirectory gets created by the file population of the container at first start. Now I'm aware this is not a really self consistent concept and therefore might be bad practice, however the problem is that is sometimes fails and sometime succeeds which is worse then just getting a error.
Steps To Reproduce
I tried to stay a close to the original setup as possible while removing everything which is not necessary. This is what I came up with to reproduce:
# parameters
NO_VARS=255
NO_RUNS=99
NO_INC_SIZE=3
TEST_PATH=/tmp
cd $TEST_PATH
cat << EOF > Dockerfile
FROM debian:latest
# create dirs to populate
RUN mkdir -p /foo/bar /foo/baz
RUN touch /foo/bar/iamhere
# create a larger container size
RUN for i in \$(seq 1 $NO_INC_SIZE); do \
cp -rv /usr /foo/baz/$i/ >/dev/null 2>&1; \
done
CMD sleep .1
EOF
cat << EOF > docker-compose.yml
services:
test:
container_name: test
image: race:condition
network_mode: "host"
volumes:
- type: volume
source: parent
target: /foo
- type: volume
source: subdir
target: /foo/bar
read_only: true
volumes:
parent:
driver: local
driver_opts:
type: none
device: \${TEST_PATH}/parent
o: bind
subdir:
driver: local
driver_opts:
type: none
device: \${TEST_PATH}/parent/bar
o: bind,ro
EOF
# build container
docker build -t race:condition .
# create a larger env file
rm -f .env
for i in $(seq 0 $NO_VARS); do
echo "VAR$i=$i" >> .env
done
echo "TEST_PATH=$TEST_PATH" >> .env
# loop to reproduce
rm -f debug.log
for no in $(seq 0 $NO_RUNS); do
# prepare clean state
docker compose down >/dev/null 2>&1
mkdir -p parent >/dev/null 2>&1
rm -rf parent/* >/dev/null 2>&1
docker volume rm -f tmp_parent tmp_subdir >/dev/null 2>&1
# test & report
docker compose up test >> debug.log 2>&1 && echo "$no: success" || echo "$no: failed"
done
Since this is a race condition it might behave differently on other systems depending on CPU,IO,etc. You can play around with the parameters in search for the "sweet spot". I did not find any config that always fails, but some that are more likely to succeed more often %)
Compose Version
Docker Compose version v2.25.0
Used on RHLE 9.3
Installed Packages
Name : docker-compose-plugin
Version : 2.25.0
Release : 1.el9
Architecture : x86_64
Size : 59 M
Source : docker-compose-plugin-2.25.0-1.el9.src.rpm
Repository : @System
From repo : Default_Organization_docker-ce-stable_docker-ce-stable_el9_x86_64
Summary : Docker Compose (V2) plugin for the Docker CLI
URL : https://github.com/docker/compose/
License : ASL 2.0
Description : Docker Compose (V2) plugin for the Docker CLI.
:
: This plugin provides the 'docker compose' subcommand.
:
: The binary can also be run standalone as a direct replacement for
: Docker Compose V1 ('docker-compose').
Docker Environment
Client: Docker Engine - Community
Version: 26.0.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.13.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.25.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 2
Server Version: 26.0.0
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.14.0-284.11.1.el9_2.x86_64
Operating System: Red Hat Enterprise Linux 9.3 (Plow)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.736GiB
Name: dstorweb01tl.unicph.domain
ID: dfe1b231-0a99-45e5-8c16-93edec96d0b2
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Anything else?
I'm posting this here because I'm not able to reproduce this with Docker alone. Eg.
mkdir -p state; rm -rf state/* state/.*; docker run --rm -ti -v ./state/log:/var/log -v state:/var debian bash
One more thing: The reason for the $TEST_PATH variable is, that it is used like this in the production setup. However, there the path gets expanded to a full path somehow leading to an error message in case of failure looking like:
Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/docker-migrid_vgrid_files_readonly/_data': failed to mount local volume: mount /opt/migrid/docker-migrid/state/vgrid_files_writable:/var/lib/docker/volumes/docker-migrid_vgrid_files_readonly/_data, flags: 0x1001: no such file or directory
I'm not able to reproduce the error that way, if I set TEST_PATH=. in my test script, it always fails with an unexpanded error message like this:
Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/tmp_subdir/_data': failed to mount local volume: mount ./parent/bar:/var/lib/docker/volumes/tmp_subdir/_data, flags: 0x1001: no such file or directory
Not sure how that fits into the whole picture though.
Can you please clarify the need for second volume subdir as this one is already nested inside a volume and is mounted inside container as same subdir?
Also, as you compare this with a plain docker run... command, please note your compose file declares a volume, not just a bind mount which doesn't make it a strict equivalent. Do you really need a volume here? Can't you just define bind mounts in your compose file?
Can you please clarify the need for second volume
subdiras this one is already nested inside a volume and is mounted inside container as same subdir?
As I said, I agree that this is not optimal. There isn't any necessity to have those nested volumes, but the config occurs like this due to a default setting. The production compose file of the application defines a volume with application state, inside there is a cache folder. Recently the devs wanted to have that state directory configurable so that one can set it to eg. a tmpfs directory. So they defined a sperate volume, which per default just points to inside the already existing state volume but might be overwritten by the users with another path.
Also, as you compare this with a plain
docker run...command, please note your compose file declares a volume, not just a bind mount which doesn't make it a strict equivalent. Do you really need a volume here? Can't you just define bind mounts in your compose file?
Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.
I'm aware that there are ways to work aroung this but since the behavior of this particular case is inconsistent and might change with every run, I thought I report it as a bug to save others the time to dig through it again.
Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.
right, but in your docker run .. reproduction example you don't use such a volume but a simple bind mount. So my question: can you get the same behavior using docker volume create ... the use created volumes to run a container?
I believe #11706 is related as overlapping configs/volumes will also trigger a race condition.
Yes in the current setup the volume in necessary as it must be populated with as directory tree at first start.
right, but in your
docker run ..reproduction example you don't use such a volume but a simple bind mount. So my question: can you get the same behavior usingdocker volume create ...the use created volumes to run a container?
I tried it, but it's not exactly the same. With plain docker I couldn't get it to succeed. It always fails but sometimes it complains about being not able to chmod the subdir and sometime it complains about not being able to mount the subdir cause it doesn't exist.
docker: Error response from daemon: failed to chmod on /var/lib/docker/volumes/rc-subdir/_data: chmod /var/lib/docker/volumes/rc-subdir/_data: read-only file system.
docker: Error response from daemon: failed to populate volume: error while mounting volume '/var/lib/docker/volumes/rc-subdir/_data': failed to mount local volume: mount /tmp/parent/subdir:/var/lib/docker/volumes/rc-subdir/_data, flags: 0x1001: no such file or directory.
Script for plain docker
# parameters
NO_VARS=2550
NO_RUNS=99
NO_INC_SIZE=3
TEST_PATH=/tmp
cd $TEST_PATH
cat << EOF > Dockerfile
FROM debian:latest
# create dirs to populate
RUN mkdir -p /foo/bar /foo/baz /foo/subdir
RUN touch /foo/bar/iamhere
# create a larger container size
RUN for i in \$(seq 1 $NO_INC_SIZE); do \
cp -rv /usr /foo/baz/$i/ >/dev/null 2>&1; \
done
CMD sleep .1
EOF
# build container
docker build -t race:condition .
# create a larger env file
rm -f .env
for i in $(seq 0 $NO_VARS); do
echo "VAR$i=$i" >> .env
done
echo "TEST_PATH=$TEST_PATH" >> .env
# loop to reproduce
rm -f debug-docker.log
for no in $(seq 0 $NO_RUNS); do
docker rm -f docker-rc-test >/dev/null 2>&1
mkdir -p parent >/dev/null 2>&1
rm -rf parent/* >/dev/null 2>&1
docker volume rm -f rc-parent >/dev/null 2>&1
docker volume rm -f rc-subdir >/dev/null 2>&1
docker volume create --driver local --opt type=none --opt device=$TEST_PATH/parent --opt o=bind rc-parent >/dev/null 2>&1
docker volume create --driver local --opt type=none --opt device=$TEST_PATH/parent/subdir --opt o=bind,ro rc-subdir >/dev/null 2>&1
# test & report
docker run -d --name docker-rc-test --env-file .env -v rc-parent:/foo -v rc-subdir:/foo/bar --rm race:condition #>> debug-docker.log 2>&1 && echo "$no: success" || echo "$no: failed"
done
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.