sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

CIFS-backed volumes don't work inside a Sysbox container with Shiftfs

Open AlexTalker opened this issue 4 years ago • 14 comments

I have detected a suspicious behavior around CIFS shares: I used portainer UI to create the volume with the following options:

type: cifs
device: //127.0.0.1/dev/legacy
o: username=dev,password=dev,vers=3.1.1,uid=1000,gid=1000,cifsacl,mfsymlinks,cache=none

but the problem is reproduced with default configuration too.

If I mount such share are read-only, it works fine, when used by a user with ID 1000 inside the container run via sysbox-runc. However, when I switch to the read-write mode, I see the following behavior:

  1. Attempt to create a file is successful(for example echo 33 > 42.txt)
  2. Attempt to copy a file from outside of the volume results in File exists
  3. Attempt to copy a file from within the volume results in File exists
  4. As a result of operation above, file is created anyway but has size = 0. When I repeat copy operation, it succeeds(size matches, content too)
  5. The issue is not reproduced if I stop all containers that used the volume via sysbox-runc(to get shiftfs off it) and then attach it to a container ran by casual, default runc

This is likely related to shiftfs but I have no idea how to debug the issue - kernel module seems to have no options whatsoever.

I am on:

 cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

and I used latest release of sysbox, portainer and docker.

Any ideas how to deal with it?

AlexTalker avatar Jan 13 '21 16:01 AlexTalker

Additionally:

# testparm -s
Load smb config files from /etc/samba/smb.conf
WARNING: The "allocation roundup size" option is deprecated
WARNING: The "syslog" option is deprecated
Loaded services file OK.
WARNING: socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=131072 SO_SNDBUF=131072
This warning is printed because you set one of the
following options: SO_SNDBUF, SO_RCVBUF, SO_SNDLOWAT,
SO_RCVLOWAT
Modern server operating systems are tuned for
high network performance in the majority of situations;
when you set 'socket options' you are overriding those
settings.
Linux in particular has an auto-tuning mechanism for
buffer sizes (SO_SNDBUF, SO_RCVBUF) that will be
disabled if you specify a socket buffer size. This can
potentially cripple your TCP/IP stack.

Getting the 'socket options' correct can make a big
difference to your performance, but getting them wrong
can degrade it by just as much. As with any other low
level setting, if you must make changes to it, make
 small changes and test the effect before making any
large changes.

Server role: ROLE_STANDALONE
# Global parameters
[global]
	allow insecure wide links = Yes
	dns proxy = No
	log file = /var/log/samba/log.%m
	map to guest = Bad User
	max log size = 1000
	min receivefile size = 16384
	obey pam restrictions = Yes
	pam password change = Yes
	panic action = /usr/share/samba/panic-action %d
	passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* .
	passwd program = /usr/bin/passwd %u
	server role = standalone server
	server signing = No
	server string = %h server (Samba, Ubuntu)
	socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=131072 SO_SNDBUF=131072
	syslog = 0
	unix password sync = Yes
	usershare allow guests = Yes
	idmap config * : backend = tdb
	aio read size = 16384
	aio write size = 16384
	allocation roundup size = 4096
	strict locking = No
	use sendfile = Yes


[homes]
	browseable = No
	comment = Home Directories
	force user = %S
	inherit acls = Yes
	inherit permissions = Yes
	map acl inherit = Yes
	map archive = No
	read only = No
	valid users = %S
	vfs objects = acl_xattr
	wide links = Yes


[printers]
	browseable = No
	comment = All Printers
	create mask = 0700
	path = /var/spool/samba
	printable = Yes


[print$]
	comment = Printer Drivers
	path = /var/lib/samba/printers

AlexTalker avatar Jan 13 '21 16:01 AlexTalker

Hi @AlexTalker , thanks for giving Sysbox a shot and for filing this issue.

I've not played around with CIFS volume mounts into Sysbox containers, but certainly shiftfs could be playing a role. In theory it should not since shiftfs is a thin overlay, but we have to investigate.

A couple of questions to help me debug:

  1. Can you provide the output of findmnt inside the container? This will help me see the CIFS mount and whether shiftfs is mounted on top of it or not.

  2. In order to repro on my side, is it as simple as creating a CIFS volume with Docker and mounting it into the container? I know you used the portainer UI to create the volume, I am wondering if you have the command line instructions to do so.

Thanks!

ctalledo avatar Jan 13 '21 16:01 ctalledo

I believe, all portainer does in UI is just provides self-explanatory fields to fill in for simple CIFS/NFS mount(host, username, password, proto), when I went a little advanced(due to wish of mapping in UNIX rights since I share from UNIX to UNIX), I went straight to specifying driver options just the same way I suppose as you do with Compose or just Docker CLI, that's why I enlisted the details in the beginning.

[# cat /tmp/findmnt.txt 
TARGET                                                       SOURCE                                                                                                    FSTYPE   OPTIONS
/                                                            .                                                                                                         shiftfs  rw,relatime
├─/sys                                                       sysfs                                                                                                     sysfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware                                            tmpfs                                                                                                     tmpfs    ro,relatime,uid=493216,gid=493216
│ ├─/sys/fs/cgroup                                           tmpfs                                                                                                     tmpfs    rw,nosuid,nodev,noexec,relatime,mode=755,uid=493216,gid=493216
│ │ ├─/sys/fs/cgroup/systemd                                 systemd                                                                                                   cgroup   rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
│ │ ├─/sys/fs/cgroup/pids                                    cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,pids
│ │ ├─/sys/fs/cgroup/rdma                                    cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,rdma
│ │ ├─/sys/fs/cgroup/cpu,cpuacct                             cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/cpuset                                  cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/net_cls,net_prio                        cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
│ │ ├─/sys/fs/cgroup/hugetlb                                 cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,hugetlb
│ │ ├─/sys/fs/cgroup/perf_event                              cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,perf_event
│ │ ├─/sys/fs/cgroup/memory                                  cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,memory
│ │ ├─/sys/fs/cgroup/freezer                                 cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,freezer
│ │ ├─/sys/fs/cgroup/devices                                 cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,devices
│ │ └─/sys/fs/cgroup/blkio                                   cgroup                                                                                                    cgroup   rw,nosuid,nodev,noexec,relatime,blkio
│ ├─/sys/kernel/config                                       tmpfs                                                                                                     tmpfs    rw,nosuid,nodev,noexec,relatime,size=1024k,uid=493216,gid=493216
│ ├─/sys/kernel/debug                                        tmpfs                                                                                                     tmpfs    rw,nosuid,nodev,noexec,relatime,size=1024k,uid=493216,gid=493216
│ ├─/sys/kernel/tracing                                      tmpfs                                                                                                     tmpfs    rw,nosuid,nodev,noexec,relatime,size=1024k,uid=493216,gid=493216
│ └─/sys/module/nf_conntrack/parameters/hashsize             sysboxfs[/sys/module/nf_conntrack/parameters/hashsize]                                                    fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/proc                                                      proc                                                                                                      proc     rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus                                                proc[/bus]                                                                                                proc     ro,relatime
│ ├─/proc/fs                                                 proc[/fs]                                                                                                 proc     ro,relatime
│ ├─/proc/irq                                                proc[/irq]                                                                                                proc     ro,relatime
│ ├─/proc/sysrq-trigger                                      proc[/sysrq-trigger]                                                                                      proc     ro,relatime
│ ├─/proc/asound                                             tmpfs                                                                                                     tmpfs    ro,relatime,uid=493216,gid=493216
│ ├─/proc/acpi                                               tmpfs                                                                                                     tmpfs    ro,relatime,uid=493216,gid=493216
│ ├─/proc/keys                                               udev[/null]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/proc/timer_list                                         udev[/null]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/proc/sched_debug                                        udev[/null]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/proc/scsi                                               tmpfs                                                                                                     tmpfs    ro,relatime,uid=493216,gid=493216
│ ├─/proc/swaps                                              sysboxfs[/proc/swaps]                                                                                     fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ ├─/proc/sys                                                sysboxfs[/proc/sys]                                                                                       fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ └─/proc/uptime                                             sysboxfs[/proc/uptime]                                                                                    fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/dev                                                       tmpfs                                                                                                     tmpfs    rw,nosuid,size=65536k,mode=755,uid=493216,gid=493216
│ ├/dev/console                                             devpts[/0]                                                                                                devpts   rw,nosuid,noexec,relatime,gid=493221,mode=620,ptmxmode=666
│ ├─/dev/mqueue                                              mqueue                                                                                                    mqueue   rw,nosuid,nodev,noexec,relatime
│ ├─/dev/pts                                                 devpts                                                                                                    devpts   rw,nosuid,noexec,relatime,gid=493221,mode=620,ptmxmode=666
│ ├─/dev/shm                                                 shm                                                                                                       tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,uid=493216,gid=493216
│ ├─/dev/kmsg                                                udev[/null]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/dev/null                                                udev[/null]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/dev/random                                              udev[/random]                                                                                             devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/dev/full                                                udev[/full]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/dev/tty                                                 udev[/tty]                                                                                                devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ ├─/dev/zero                                                udev[/zero]                                                                                               devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
│ └─/dev/urandom                                             udev[/urandom]                                                                                            devtmpfs rw,nosuid,noexec,relatime,size=8118552k,nr_inodes=2029638,mode=755
├─/home/dev                                                  /var/lib/docker/volumes/UserLegacyTest/_data                                                              shiftfs  rw,relatime
├─/etc/resolv.conf                                           /var/lib/docker/containers/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5[/resolv.conf] shiftfs  rw,relatime
├─/etc/hostname                                              /var/lib/docker/containers/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5[/hostname]    shiftfs  rw,relatime
├─/etc/hosts                                                 /var/lib/docker/containers/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5[/hosts]       shiftfs  rw,relatime
├─/var/lib/docker                                            /dev/sda5[/var/lib/sysbox/docker/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5]        ext4     rw,relatime,errors=remount-ro,stripe=32730
├─/var/lib/kubelet                                           /dev/sda5[/var/lib/sysbox/kubelet/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5]       ext4     rw,relatime,errors=remount-ro,stripe=32730
└─/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs /dev/sda5[/var/lib/sysbox/containerd/435e77513d711c315854bd100c1eb9e0cf6dde6bc89f0a8c617408918e53a4f5]    ext4     rw,relatime,errors=remount-ro,stripe=32730

Last time I used official Arch image for the test. Also, as you might noticed, I share a folder from a home directory of a user(dev) which in the system itself has UID=1006 but since container user has UID=1000, I specify it instead of root/root default behavior.

As I stated before, read-only mode works quite okay for my purposes but read-write seems to mess things up somewhere.

Just now, I intially tried to write output of findmnt on the volume too and it didnt game errors(the shell redirect >) but file size was 0 anyway, especially on the original FS(ext4). Strange.

AlexTalker avatar Jan 13 '21 16:01 AlexTalker

-rwxrwxr-x+ 1 dev dev 1.0M Jan 13 18:36 test
-rwxrwxr-x+ 1 dev dev    0 Jan 13 18:57 test2
-rwxrwxr-x+ 1 dev dev    0 Jan 13 19:01 test3
-rwxrwxr-x+ 1 dev dev    0 Jan 13 19:43 test4

This is view from ext4, all the "copied" files have size 0. Docker tricked me into thinking it succeeded after file creation but the reality is even more disappointing :(

AlexTalker avatar Jan 13 '21 16:01 AlexTalker

After literally just switching runtime for the container(which I think means that Portainer re-creates container), it does work as expected:

-rwxrwxr-x+ 1 dev dev 1.0M Jan 13 18:36 test
-rwxrwxr-x+ 1 dev dev    0 Jan 13 18:57 test2
-rwxrwxr-x+ 1 dev dev    0 Jan 13 19:01 test3
-rwxrwxr-x+ 1 dev dev    0 Jan 13 19:43 test4
-rwxrwxr-x+ 1 dev dev 1.0M Jan 13 19:47 test5
# findmnt
TARGET                           SOURCE                          FSTYPE  OPTIONS
/                                overlay                         overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/OGJIWCE7ASZ7ISKXGCQD7EZ4HD:/var/lib/docker/overlay2/l/QJCG
├─/proc                          proc                            proc    rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus                    proc[/bus]                      proc    ro,relatime
│ ├─/proc/fs                     proc[/fs]                       proc    ro,relatime
│ ├─/proc/irq                    proc[/irq]                      proc    ro,relatime
│ ├─/proc/sys                    proc[/sys]                      proc    ro,relatime
│ ├─/proc/sysrq-trigger          proc[/sysrq-trigger]            proc    ro,relatime
│ ├─/proc/asound                 tmpfs                           tmpfs   ro,relatime
│ ├─/proc/acpi                   tmpfs                           tmpfs   ro,relatime
│ ├─/proc/kcore                  tmpfs[/null]                    tmpfs   rw,nosuid,size=65536k,mode=755
│ ├─/proc/keys                   tmpfs[/null]                    tmpfs   rw,nosuid,size=65536k,mode=755
│ ├─/proc/timer_list             tmpfs[/null]                    tmpfs   rw,nosuid,size=65536k,mode=755
│ ├─/proc/sched_debug            tmpfs[/null]                    tmpfs   rw,nosuid,size=65536k,mode=755
│ └─/proc/scsi                   tmpfs                           tmpfs   ro,relatime
├─/dev                           tmpfs                           tmpfs   rw,nosuid,size=65536k,mode=755
│ ├─/dev/console                 devpts[/0]                      devpts  rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666
│ ├─/dev/pts                     devpts                          devpts  rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666
│ ├─/dev/mqueue                  mqueue                          mqueue  rw,nosuid,nodev,noexec,relatime
│ └─/dev/shm                     shm                             tmpfs   rw,nosuid,nodev,noexec,relatime,size=65536k
├─/sys                           sysfs                           sysfs   ro,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware                tmpfs                           tmpfs   ro,relatime
│ └─/sys/fs/cgroup               tmpfs                           tmpfs   rw,nosuid,nodev,noexec,relatime,mode=755
│   ├─/sys/fs/cgroup/systemd     cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,xattr,name=systemd
│   ├─/sys/fs/cgroup/pids        cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,pids
│   ├─/sys/fs/cgroup/rdma        cgroup                          cgroup  ro,nosuid,nodev,noexec,relatime,rdma
│   ├─/sys/fs/cgroup/cpu,cpuacct cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,cpu,cpuacct
│   ├─/sys/fs/cgroup/cpuset      cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,cpuset
│   ├─/sys/fs/cgroup/net_cls,net_prio
│   │                            cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,net_cls,net_prio
│   ├─/sys/fs/cgroup/hugetlb     cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,hugetlb
│   ├─/sys/fs/cgroup/perf_event  cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,perf_event
│   ├─/sys/fs/cgroup/memory      cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,memory
│   ├─/sys/fs/cgroup/freezer     cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,freezer
│   ├─/sys/fs/cgroup/devices     cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│   │                                                            cgroup  ro,nosuid,nodev,noexec,relatime,devices
│   └─/sys/fs/cgroup/blkio       cgroup[/docker/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928]
│                                                                cgroup  ro,nosuid,nodev,noexec,relatime,blkio
├─/home/dev                      //127.0.0.1/dev/legacy[/legacy] cifs    rw,relatime,vers=3.0,cache=strict,username=dev,uid=0,noforceuid,gid=0,noforcegid,addr=127.0.0.1,file_mode=
├─/etc/resolv.conf               /dev/sda5[/var/lib/docker/containers/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928/resolv.conf]
│                                                                ext4    rw,relatime,errors=remount-ro,stripe=32730
├─/etc/hostname                  /dev/sda5[/var/lib/docker/containers/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928/hostname]
│                                                                ext4    rw,relatime,errors=remount-ro,stripe=32730
└─/etc/hosts                     /dev/sda5[/var/lib/docker/containers/85b78a1ac648435a8b50425f7fd189c3f00423b4c55e921d752ccc4b23128928/hosts]
                                                                 ext4    rw,relatime,errors=remount-ro,stripe=32730

AlexTalker avatar Jan 13 '21 16:01 AlexTalker

Hi @AlexTalker, thanks again for all the info provided.

I was able to reproduce the problem, and it certainly appears to be caused by the interaction between shiftfs and cifs (shiftfs is acting as a thin overlay on top of cifs).

Unfortunately the kernel log (dmesg) did not provide much info, except for the following:

[1994479.821465] CIFS VFS: cifs_invalidate_mapping: could not invalidate inode 000000004fa38d62

A similar problem was spotted last year on LXD (which also uses shiftfs): https://github.com/lxc/lxd/issues/6590.

Solving this will require going down into kernel space to figure out what's causing the bad interaction between these filesystems. Unfortunately I don't have the cycles to do this right now (due to other priorities).

In the meantime, a work-around in order to mount a cifs-backed volume into a Sysbox container would be to configure Docker in userns-remap mode. This way Sysbox won't need to use shiftfs anymore.

If you want to do this, add the "userns-remap" line to the /etc/docker/daemon.json file:

cat /etc/docker/daemon.json
{
    "userns-remap": "sysbox",
    "runtimes": {
        "sysbox-runc": {
            "path": "/usr/local/sbin/sysbox-runc"
        }
    },
    "default-address-pools": [
        {
            "base": "172.80.0.0/16",
            "size": 24
        }
    ],
    "bip": "172.20.0.1/16"
}

Then restart Docker with systemctl restart docker. And then create the container with Docker + Sysbox as usual.

One caveat: if you decide to use userns-remap, then the CIFS mount must be configured with subuid:subgid that matches the ones associated with Sysbox. Otherwise the files will show up as nobody:nogroup inside the container.

For example, on my machine, Sysbox is associated with the following subuid 165536:

$ cat /etc/subuid | grep sysbox
sysbox:165536:65536

Thus, I had to mount the cifs share as follows:

sudo mount -t cifs -o username="cesar",uid=165536,gid=165536 //10.0.0.48/sambashare /mnt/winshare

Then I launched the container with:

docker run --runtime=sysbox-runc --rm -it -v /mnt/winshare:/mnt/winshare nestybox/ubuntu-focal-systemd-docker

And inside the container the cifs share is mounted properly:

# findmnt | grep cifs
|-/mnt/winshare                                              //10.0.0.48/sambashare                                                                                                           cifs     rw,relatime,vers=3.1.1,cache=strict,username=cesar,uid=165536,forceuid,gid=165536,forcegid,addr=10.0.0.48,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1

and the files have the appropriate ownership:

root@200e385f451d:~# ls -l /mnt/winshare/
total 2048
-rwxr-xr-x 1 root root 9 Jan 13 22:56 test
-rwxr-xr-x 1 root root 6 Jan 13 22:56 test2

After this everything works normally.

ctalledo avatar Jan 13 '21 23:01 ctalledo

By the way, there is work happening at kernel level that will void the need for shiftfs in the near future. This will likely fix this issue and void the need for the work-around I described.

ctalledo avatar Jan 13 '21 23:01 ctalledo

@ctalledo Thanks for looking into it for me. You see, that's exactly what bothers me - I do not want to have volume mapped under root rights in docker since I dont wanna scare off some utils I am about to use there potentially.

Is it possible to somehow add additional map so that files can be mounted under UID of container user?

AlexTalker avatar Jan 14 '21 08:01 AlexTalker

@ctalledo I spotted such record in /etc/subuid:

dev:558752:65536

will it suffice my needs then if I supply 558... in mounting params? Or it will still be nobody?

AlexTalker avatar Jan 14 '21 08:01 AlexTalker

Hi @AlexTalker,

Since Sysbox uses the Linux user-namespace for its containers, there is mapping of user-IDs going on.

Assuming that at host level:

  • You've configured Docker with userns-remap: "sysbox"
  • And files "/etc/subuid" and "/etc/subgid" have an entry such as sysbox:165536:65536

Then inside the container:

  • User 0 (Root) = host user 165536
  • User 1000 = host user 165536 + 1000

Thus, say you want the cifs volume to appear inside the container as owned by user 1000. Then at host level you would create the cifs mount with uid:gid 165536+1000 = 166536. E.g.,

sudo mount -t cifs -o username="cesar",uid=166536,gid=166536 //10.0.0.48/sambashare /mnt/winshare

and then create the container as usual:

docker run --runtime=sysbox-runc --rm -it -v /mnt/winshare:/mnt/winshare nestybox/ubuntu-focal-systemd-docker

Does that answer your question?

Note that the /etc/subuid file inside the container has no bearing on this. It's the /etc/subuid file at host level you care about.

ctalledo avatar Jan 14 '21 16:01 ctalledo

@ctalledo If I understand correctly, does /etc/subuid act as "slice off" of IDs then, if you state that such math works? If so, how does one limit how many IDs are available in the container?

Gonna try your trick tomorrow, see what I get. Also, in this case, do casual containers still can function "normally"? I mean, is "userns-remap" only matters for sysbox-runc?!

AlexTalker avatar Jan 14 '21 16:01 AlexTalker

@ctalledo If I understand correctly, does /etc/subuid act as "slice off" of IDs then, if you state that such math works?

Correct.

If so, how does one limit how many IDs are available in the container?

Sysbox assigns a range of 65536 UIDs to the container. It takes these from the slice associated with user "sysbox" in /etc/subuid.

Also, in this case, do casual containers still can function "normally"? I mean, is "userns-remap" only matters for sysbox-runc?!

Docker userns-remap applies to all Docker containers, even those deployed with the default OCI runc runtime. This improves container isolation (root in the container is not root on the host), but does have some limitations (see [here]).(https://docs.docker.com/engine/security/userns-remap/).

In general we prefer that Docker remain in regular mode, but this requires Sysbox to use shiftfs, which is mostly fine, though you found that shiftfs-on-cifs is not working properly (unfortunately).

ctalledo avatar Jan 14 '21 17:01 ctalledo

@ctalledo Thanks for the explanation, could you please also highlight whether or not I need to change owndership in /var/lib/docker to make it work and/or restart respective sysbox services?! Or docker restart & image re-setup is enough?

AlexTalker avatar Jan 15 '21 10:01 AlexTalker

Hi @AlexTalker,

could you please also highlight whether or not I need to change owndership in /var/lib/docker to make it work

No this should not be needed, Docker takes care of it (Docker is the sole manager of that directory, so if we are changing ownership there something is off).

and/or restart respective sysbox services?! Or docker restart & image re-setup is enough?

No need to restart Sysbox when configuring Docker in userns-remap. The Docker restart is enough. Just make sure all containers are stopped/removed before switching Docker to userns-remap mode.

ctalledo avatar Jan 15 '21 17:01 ctalledo