sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

docker run --storage-opt=size=<x> fails within sysbox container with XFS /var/lib/docker

Open struanb opened this issue 3 years ago • 11 comments

On a host with an XFS filesystem mounted at /var/lib/docker, docker run --storage-opt=size=<x> succeeds on containers (runc and sysbox-runc) launched from the host, but fails on containers launched from within a sysbox-runc container with a volume mounted /var/lib/docker, running dockerd.

Steps to reproduce:

  1. Mount an XFS filesystem at /var/lib/docker on the host:
# mount -v | grep /var/lib/docker
/dev/sdb on /var/lib/docker type xfs (rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
  1. Launch a sysbox container e.g. docker run --runtime=sysbox-runc --mount=type=volume,dst=/var/lib/docker and launchdockerd within the container. Confirm that, within the sysbox container, /var/lib/docker is indeed XFS (as it should be, as it will be located within /var/lib/docker/volumes on the host):
# mount -v | grep docker
/dev/sdb on /var/lib/docker type xfs (rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
  1. Now, within the same sysbox container, running docker run --rm -it --storage-opt=size=<x> debian bash will fail:
# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.

struanb avatar Feb 09 '22 23:02 struanb

Hi @struanb , thanks for filing the issue.

# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.

Sounds like the inner Docker does not think it's /var/lib/docker is on XFS, even though you confirmed that it is.

What does the inner Docker report with docker info?

Thanks!

ctalledo avatar Feb 10 '22 00:02 ctalledo

Here is docker info from the inner Docker, on the sysbox container:

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-9-cloud-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 4GiB
 Name: b7c93c4c034b
 ID: GVSC:KL2A:V3FX:AXJV:KV3N:DNJM:PNCT:HJXE:G4TN:F7IC:RMBK:BH2N
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Also, for completeness, here is findmnt /var/lib/docker from the sysbox container:

$ findmnt /var/lib/docker/
TARGET          SOURCE                                                                                    FSTYPE OPTIONS
/var/lib/docker /dev/sdb[/volumes/a8ff0dfc54fe3e83c50871aa0e282142698657958b4de5ea09df57b5b6128652/_data] xfs    rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota

struanb avatar Feb 10 '22 21:02 struanb

In case it helps, here is a diff of docker info on the host, and on the inner dockerd:

10,11c10,11
<  Containers: 2
<   Running: 2
---
>  Containers: 0
>   Running: 0
14c14
<  Images: 2
---
>  Images: 0
19c19
<   Native Overlay Diff: true
---
>   Native Overlay Diff: false
22c22
<  Cgroup Driver: systemd
---
>  Cgroup Driver: cgroupfs
29c29
<  Runtimes: sysbox-runc io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
---
>  Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
36d35
<   apparmor
41c40
<  Operating System: Debian GNU/Linux 11 (bullseye)
---
>  Operating System: Debian GNU/Linux 10 (buster)
45,47c44,46
<  Total Memory: 31.36GiB
<  Name: demo4-worker-7mg1
<  ID: X4S7:G6GN:WEYJ:XHRP:2QQT:FDGW:BXNM:HMPK:KUNH:COMA:73DK:IFG7
---
>  Total Memory: 4GiB
>  Name: b7c93c4c034b
>  ID: GVSC:KL2A:V3FX:AXJV:KV3N:DNJM:PNCT:HJXE:G4TN:F7IC:RMBK:BH2N
55,58c54
<  Live Restore Enabled: true
<  Default Address Pools:
<    Base: 172.25.0.0/16, Size: 24
< 
---
>  Live Restore Enabled: false

struanb avatar Feb 10 '22 21:02 struanb

Hi @struanb, thanks for the info; however I don't see anything in it that would explain the error you are seeing.

The only thing that catches my attention is that Docker complains about xfs lacking mount option pquota, and the xfs mount shows mount option prjquota. I believe they are equivalent per the xfs man page, so that should not be the problem, but I don't know.

Just as a sanity check, what version of Docker runs on the host versus inside the Sysbox container?

ctalledo avatar Feb 12 '22 02:02 ctalledo

HI @ctalledo thanks for considering this. I believe the prjquota cannot be the problem, as the host also reports prjquota yet doesn't complain.

Both host and sysbox container are running Debian Bullseye, and docker info reports version 20.10.12 on both.

Is it possible that shiftfs is not propagating some other property from the underlying xfs filesystem that Docker expects to find?

struanb avatar Feb 12 '22 22:02 struanb

I've also just retested with shiftfs removed, and userns enabled (by the sysbox package installer), and I experience the same error, which seems to rule out shiftfs per-sais, although I don't know if uid/gid mapping could still be triggering the issue.

The only other difference reported by docker info between host and sysbox container is Native Overlay Diff, which is true on the host, and false on the sysbox container.

P.S. Even if unrelated to this issue, this might be worth looking into as apparently this may impact performance:

time="2022-02-12T22:55:43.937324506Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace" storage-driver=overlay2

struanb avatar Feb 12 '22 23:02 struanb

A bit more digging, suggests that access to the underlying XFS device may be needed for quota management.

Running strace xfs_quota -c quota /var/lib/docker/ on the host (where /dev/sdb is the XFS block device) shows access is made to /dev/sdb, and ultimately quotactl(QCMD(Q_XQUOTASYNC, USRQUOTA), "/dev/sdb") = 0 is called.

Bind-mounting /dev/sdb from host into the sysbox container would rather defeats the purpose if, as seem likely, it undermines security; but in any event, inside the sysbox container the same call fails with quotactl(QCMD(Q_XQUOTASYNC, USRQUOTA), "/dev/sdb") = -1 EPERM (Operation not permitted).

struanb avatar Feb 12 '22 23:02 struanb

I have found a workaround for this issue in my particular use-case, which is to apply an overall XFS quota for the volume mounted at /var/lib/docker on the sysbox container (using --volume-opt=Size=<size>), rather than for the inner containers via --storage-opt=Size=<size> (support for which was added to Docker recently see https://github.com/moby/moby/pull/41330).

struanb avatar Feb 14 '22 13:02 struanb

Hi @struanb, thanks for the update regarding the work-around. Using --volume-opt on the volume sounds great, was not aware of this option. How did you specify it? I image at volume creation time correct?

Having said this, we still want to investigate why using --storage-opt on the inner Docker containers does not work. I think this will require digging a bit more into XFS as well as into the Docker engine code to see exactly why it's confused into thinking that it's not working on XFS:

# docker run --rm -it --storage-opt=size=2G debian bash
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.

ctalledo avatar Feb 14 '22 17:02 ctalledo

Examples (on the host):

docker run --rm -it --mount=type=volume,dst=/opt,volume-opt=size=2G debian bash -c 'df -h / /opt'
Filesystem      Size  Used Avail Use% Mounted on
overlay          50G  5.2G   45G  11% /
/dev/sdb        2.0G     0  2.0G   0% /opt
docker volume create --name test124 -o size=5G && docker run --rm -it --mount=type=volume,src=test124,dst=/opt debian bash -c '
df -h / /opt'
test124
Filesystem      Size  Used Avail Use% Mounted on
overlay          50G  4.1G   46G   9% /
/dev/sdb        5.0G     0  5.0G   0% /opt

struanb avatar Feb 14 '22 20:02 struanb

Thanks for the info @struanb ...

ctalledo avatar Feb 15 '22 01:02 ctalledo