sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

/sys is empty when using docker + docker-container build driver inside sysbox pod

Open EddieX64 opened this issue 1 year ago • 6 comments

Hi everyone,

So, before starting, inside sysbox pod I created a docker buildx builder using the command docker buildx create --bootstrap --driver docker-container --name builder-test_builder and created simple Dockerfile:

FROM ubuntu:latest
RUN find /sys/devices/system/cpu -type f -exec sh -c 'echo "File: $1"; cat $1' sh {} \;

Then i tried to use this builder to build the Dockerfile above using the command docker buildx build --load --builder builder-test_builder --no-cache --progress=plain --network=host --pull . And it failed with the error find: '/sys/devices/system/cpu': No such file or directory

I added some ls and mount commands to see the contents of a container during the build:

FROM ubuntu:latest
RUN echo $(ls -la /sys/)
RUN echo $(mount)
RUN find /sys/devices/system/cpu -type f -exec sh -c 'echo "File: $1"; cat $1' sh {} \;

And i can see that there is nothing mounted on /sys and the directory is empty:

#5 [2/4] RUN echo $(ls -la /sys/)
#5 0.173 
total 8 
drwxr-xr-x 2 root root 4096 Apr 22 13:08 . 
drwxr-xr-x 1 root root 4096 Jul 26 07:36 ..
#5 DONE 0.2s

#6 [3/4] RUN echo $(mount)
#6 0.179 overlay on / type overlay (rw,relatime,lowerdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/25/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/19/fs,upperdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/26/fs,workdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/26/work,userxattr) 
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) 
sysboxfs on /proc/uptime type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
sysboxfs on /proc/swaps type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
sysboxfs on /proc/sys type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) 
tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64) 
devtmpfs on /proc/keys type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /proc/timer_list type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
tmpfs on /proc/scsi type tmpfs (ro,relatime,uid=165536,gid=165536,inode64) 
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64) 
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666) 
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165536,gid=165536,inode64) 
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) 
/dev/sdg on /etc/resolv.conf type ext4 (ro,nosuid,nodev,noexec,relatime) 
/dev/sdg on /etc/hosts type ext4 (ro,nosuid,nodev,noexec,relatime) 
overlay on /dev/otel-grpc.sock type overlay (ro,relatime,lowerdir=/var/lib/docker/overlay2/l/RJVRU4QVXYBYWWXLBZBBZBOZRC:/var/lib/docker/overlay2/l/RCVNBRGBSM3ZCVJCYX4SOCZHMX:/var/lib/docker/overlay2/l/6KBC5IMIT236SQVREGTFJFZZ7O:/var/lib/docker/overlay2/l/E5I4B3NZV6M46Z5AN4OQVMSFWM:/var/lib/docker/overlay2/l/NSK4TUHBHIK7AVSLHFGREWSCRO,upperdir=/var/lib/docker/overlay2/e26e5c637ef3860328d237fba463bb6444afa809fee3dc95a4b0432a5b8ffdb0/diff,workdir=/var/lib/docker/overlay2/e26e5c637ef3860328d237fba463bb6444afa809fee3dc95a4b0432a5b8ffdb0/work,userxattr) 
devtmpfs on /dev/null type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /dev/random type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /dev/full type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /dev/tty type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /dev/zero type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /dev/urandom type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64) 
devtmpfs on /proc/kcore type devtmpfs (rw,relatime,size=32914880k,nr_inodes=8228720,mode=755,inode64)
#6 DONE 0.2s

Then i tried building the same Dockerfile using the default docker driver with command docker buildx build --no-cache --progress=plain --network=host --pull . and it succeeded:

#5 [2/3] RUN echo $(ls -la /sys/)
#5 0.238 
total 4 
dr-xr-xr-x 13 nobody nogroup 0 Aug 22 09:28 . 
drwxr-xr-x 1 root root 4096 Aug 22 09:54 .. 
drwxr-xr-x 2 nobody nogroup 0 Aug 22 09:28 block 
drwxr-xr-x 40 nobody nogroup 0 Aug 22 09:28 bus 
drwxr-xr-x 70 nobody nogroup 0 Aug 22 09:28 class 
drwxr-xr-x 4 nobody nogroup 0 Aug 22 09:28 dev 
drwxr-xr-x 15 nobody nogroup 0 Aug 22 09:28 devices 
drwxrwxrwt 2 root root 40 Aug 22 09:54 firmware 
drwxr-xr-x 9 nobody nogroup 0 Aug 22 09:28 fs 
drwxr-xr-x 2 nobody nogroup 0 Aug 22 09:28 hypervisor 
drwxr-xr-x 18 nobody nogroup 0 Aug 22 06:48 kernel 
drwxr-xr-x 203 nobody nogroup 0 Aug 22 09:28 module 
drwxr-xr-x 3 nobody nogroup 0 Aug 22 09:28 power
#5 DONE 0.3s

#6 [3/3] RUN echo $(mount)
#6 0.268 
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/4C5XDWUFN7A4AEF6MHVTO5IUP4:/var/lib/docker/overlay2/l/MXMDVK24HFT4MXPECX2XEAZIHH,upperdir=/var/lib/docker/overlay2/ano5gekli4z1mky3fggul5j3q/diff,workdir=/var/lib/docker/overlay2/ano5gekli4z1mky3fggul5j3q/work,userxattr) 
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) 
sysboxfs on /proc/uptime type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
sysboxfs on /proc/swaps type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
sysboxfs on /proc/sys type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) 
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) 
tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=100000,gid=100000,inode64) 
devtmpfs on /proc/keys type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /proc/timer_list type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
tmpfs on /proc/scsi type tmpfs (ro,relatime,uid=100000,gid=100000,inode64) 
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,uid=100000,gid=100000,inode64) 
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=100005,mode=620,ptmxmode=666) 
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,uid=100000,gid=100000,inode64) 
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime) 
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) 
sysboxfs on /sys/kernel type fuse (ro,nosuid,nodev,noexec,relatime,user_id=0,group_id=0,default_permissions,allow_other) sysboxfs on /sys/devices/virtual type fuse (ro,nosuid,nodev,noexec,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
sysboxfs on /sys/module/nf_conntrack/parameters type fuse (ro,nosuid,nodev,noexec,relatime,user_id=0,group_id=0,default_permissions,allow_other) 
/dev/root on /etc/resolv.conf type ext4 (ro,nosuid,nodev,noexec,relatime,discard,errors=remount-ro) 
/dev/root on /etc/hosts type ext4 (ro,nosuid,nodev,noexec,relatime,discard,errors=remount-ro) 
cgroup on /sys/fs/cgroup type cgroup2 (ro,nosuid,nodev,noexec,relatime) 
devtmpfs on /dev/null type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /dev/random type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /dev/full type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /dev/tty type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /dev/zero type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /dev/urandom type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
devtmpfs on /proc/kcore type devtmpfs (rw,relatime,size=32913820k,nr_inodes=8228455,mode=755,inode64) 
tmpfs on /sys/firmware type tmpfs (ro,relatime,uid=100000,gid=100000,inode64) 
tmpfs on /sys/devices/virtual/powercap type tmpfs (ro,relatime,uid=100000,gid=100000,inode64)
#6 DONE 0.3s

Also i’ve tested the same scenario on a regular VM and it succeeds using both docker build drivers, so i guess this is not a bug in docker. Does anyone have any ideas what the problem might be?

EddieX64 avatar Aug 22 '24 10:08 EddieX64

Hi @EddieX64, thanks for reporting the issue and all the detailed info you provided.

I was able to easily repro, but I don't have an explanation yet, other than I don't think it's a problem in Sysbox :)

So in this scenario there's a double nesting going on: first we have a the buildx builder-test_builder container running inside the Sysbox container, and then inside that builder-test_builder, docker will do the build by running (doubly-nested) containers.

Inside the Sysbox container, as well as inside the buildx builder-test_builder container, I can see that /sys is properly mounted:

$ docker exec buildx_buildkit_builder-test_builder0 mount | grep " /sys"
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
sysboxfs on /sys/kernel type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
sysboxfs on /sys/devices/virtual type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
sysboxfs on /sys/module/nf_conntrack/parameters type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

So I don't understand why /sys is not mounted in the containers that run inside the builder-test_builder container (i.e., the double nested containers). I suspect it's a buildx behavior, but not sure. Let me ask inside Docker.

FYI, I did try a double-nesting scenario by running the docker:dind image inside a Sysbox container, and then using that to create a doubly-nested container. I could see that /sys was properly mounted inside the double-nested container as expected.

ctalledo avatar Aug 27 '24 04:08 ctalledo

Hi @ctalledo Understand that there are plenty of other issues, but any updates on this by chance? I tried to use newer versions of Docker and buildkit, but still getting the same behavior :(

EddieX64 avatar Sep 25 '24 14:09 EddieX64

Hi @EddieX64, thanks for the reminder.

I don't yet have a meaningful update, other than I am working with our main docker buildx expert to figure out why /sys is not getting mounted during the build inside the Sysbox container.

Hope to have an answer soon.

ctalledo avatar Oct 10 '24 00:10 ctalledo

Suspect the problem is coming from this buildkit code, where it's not mounting sysfs in "rootless" environments, such as inside a Sysbox container:

https://github.com/moby/buildkit/blob/master/util/rootless/specconv/specconv_linux.go#L15-L32

ctalledo avatar Oct 10 '24 22:10 ctalledo

Hi @ctalledo Thank you very much for the investigation, the picture is more or less clearer now. I assume that there is no way to get around this from the sysbox point of view?

EddieX64 avatar Oct 11 '24 08:10 EddieX64

Hi @EddieX64, yes it's not a Sysbox issue per-se, but I am working with the buildx developers to see if we can make a fix there. It may take a while though.

ctalledo avatar Oct 11 '24 16:10 ctalledo