sysbox
sysbox copied to clipboard
docker run --storage-opt=size=<x> fails within sysbox container with XFS /var/lib/docker
On a host with an XFS filesystem mounted at /var/lib/docker, docker run --storage-opt=size=<x> succeeds on containers (runc and sysbox-runc) launched from the host, but fails on containers launched from within a sysbox-runc container with a volume mounted /var/lib/docker, running dockerd.
Steps to reproduce:
- Mount an XFS filesystem at
/var/lib/dockeron the host:
# mount -v | grep /var/lib/docker
/dev/sdb on /var/lib/docker type xfs (rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
- Launch a sysbox container e.g.
docker run --runtime=sysbox-runc --mount=type=volume,dst=/var/lib/dockerand launchdockerdwithin the container. Confirm that, within the sysbox container,/var/lib/dockeris indeed XFS (as it should be, as it will be located within/var/lib/docker/volumeson the host):
# mount -v | grep docker
/dev/sdb on /var/lib/docker type xfs (rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
- Now, within the same sysbox container, running
docker run --rm -it --storage-opt=size=<x> debian bashwill fail:
# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.
Hi @struanb , thanks for filing the issue.
# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.
Sounds like the inner Docker does not think it's /var/lib/docker is on XFS, even though you confirmed that it is.
What does the inner Docker report with docker info?
Thanks!
Here is docker info from the inner Docker, on the sysbox container:
$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
scan: Docker Scan (Docker Inc., v0.12.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.10.0-9-cloud-amd64
Operating System: Debian GNU/Linux 10 (buster)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 4GiB
Name: b7c93c4c034b
ID: GVSC:KL2A:V3FX:AXJV:KV3N:DNJM:PNCT:HJXE:G4TN:F7IC:RMBK:BH2N
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Also, for completeness, here is findmnt /var/lib/docker from the sysbox container:
$ findmnt /var/lib/docker/
TARGET SOURCE FSTYPE OPTIONS
/var/lib/docker /dev/sdb[/volumes/a8ff0dfc54fe3e83c50871aa0e282142698657958b4de5ea09df57b5b6128652/_data] xfs rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota
In case it helps, here is a diff of docker info on the host, and on the inner dockerd:
10,11c10,11
< Containers: 2
< Running: 2
---
> Containers: 0
> Running: 0
14c14
< Images: 2
---
> Images: 0
19c19
< Native Overlay Diff: true
---
> Native Overlay Diff: false
22c22
< Cgroup Driver: systemd
---
> Cgroup Driver: cgroupfs
29c29
< Runtimes: sysbox-runc io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
---
> Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
36d35
< apparmor
41c40
< Operating System: Debian GNU/Linux 11 (bullseye)
---
> Operating System: Debian GNU/Linux 10 (buster)
45,47c44,46
< Total Memory: 31.36GiB
< Name: demo4-worker-7mg1
< ID: X4S7:G6GN:WEYJ:XHRP:2QQT:FDGW:BXNM:HMPK:KUNH:COMA:73DK:IFG7
---
> Total Memory: 4GiB
> Name: b7c93c4c034b
> ID: GVSC:KL2A:V3FX:AXJV:KV3N:DNJM:PNCT:HJXE:G4TN:F7IC:RMBK:BH2N
55,58c54
< Live Restore Enabled: true
< Default Address Pools:
< Base: 172.25.0.0/16, Size: 24
<
---
> Live Restore Enabled: false
Hi @struanb, thanks for the info; however I don't see anything in it that would explain the error you are seeing.
The only thing that catches my attention is that Docker complains about xfs lacking mount option pquota, and the xfs mount shows mount option prjquota. I believe they are equivalent per the xfs man page, so that should not be the problem, but I don't know.
Just as a sanity check, what version of Docker runs on the host versus inside the Sysbox container?
HI @ctalledo thanks for considering this. I believe the prjquota cannot be the problem, as the host also reports prjquota yet doesn't complain.
Both host and sysbox container are running Debian Bullseye, and docker info reports version 20.10.12 on both.
Is it possible that shiftfs is not propagating some other property from the underlying xfs filesystem that Docker expects to find?
I've also just retested with shiftfs removed, and userns enabled (by the sysbox package installer), and I experience the same error, which seems to rule out shiftfs per-sais, although I don't know if uid/gid mapping could still be triggering the issue.
The only other difference reported by docker info between host and sysbox container is Native Overlay Diff, which is true on the host, and false on the sysbox container.
P.S. Even if unrelated to this issue, this might be worth looking into as apparently this may impact performance:
time="2022-02-12T22:55:43.937324506Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace" storage-driver=overlay2
A bit more digging, suggests that access to the underlying XFS device may be needed for quota management.
Running strace xfs_quota -c quota /var/lib/docker/ on the host (where /dev/sdb is the XFS block device) shows access is made to /dev/sdb, and ultimately quotactl(QCMD(Q_XQUOTASYNC, USRQUOTA), "/dev/sdb") = 0 is called.
Bind-mounting /dev/sdb from host into the sysbox container would rather defeats the purpose if, as seem likely, it undermines security; but in any event, inside the sysbox container the same call fails with quotactl(QCMD(Q_XQUOTASYNC, USRQUOTA), "/dev/sdb") = -1 EPERM (Operation not permitted).
I have found a workaround for this issue in my particular use-case, which is to apply an overall XFS quota for the volume mounted at /var/lib/docker on the sysbox container (using --volume-opt=Size=<size>), rather than for the inner containers via --storage-opt=Size=<size> (support for which was added to Docker recently see https://github.com/moby/moby/pull/41330).
Hi @struanb, thanks for the update regarding the work-around. Using --volume-opt on the volume sounds great, was not aware of this option. How did you specify it? I image at volume creation time correct?
Having said this, we still want to investigate why using --storage-opt on the inner Docker containers does not work. I think this will require digging a bit more into XFS as well as into the Docker engine code to see exactly why it's confused into thinking that it's not working on XFS:
# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.
Examples (on the host):
docker run --rm -it --mount=type=volume,dst=/opt,volume-opt=size=2G debian bash -c 'df -h / /opt'
Filesystem Size Used Avail Use% Mounted on
overlay 50G 5.2G 45G 11% /
/dev/sdb 2.0G 0 2.0G 0% /opt
docker volume create --name test124 -o size=5G && docker run --rm -it --mount=type=volume,src=test124,dst=/opt debian bash -c '
df -h / /opt'
test124
Filesystem Size Used Avail Use% Mounted on
overlay 50G 4.1G 46G 9% /
/dev/sdb 5.0G 0 5.0G 0% /opt
Thanks for the info @struanb ...