x11docker support for sysbox (userns-remap issues)
Currently I work on support for sysbox-runc in x11docker that allows to run GUI applications in container.
After installing the sysbox debian package, I found new entries for sysbox in /etc/docker/daemon.json:
{
"graph": "/sda7docker",
"default-runtime": "runc",
"runtimes": {
"kata-runtime": {
"path": "/usr/bin/kata-runtime"
},
"crun": {
"path": "/usr/local/bin/crun"
},
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
},
"userns-remap": "sysbox"
"bip": "172.20.0.1/16",
"default-address-pools": [
{
"base": "172.25.0.0/16",
"size": 24
}
]
}
docker with sysbox-runc works well so far, but I don't have access to already existing images.
I have to remove "userns-remap": "sysbox" to get back access to my images. But than sysbox-runc doesn't work anymore.
I have tried to move the entry to sysbox-runc only, but that does not work:
{
"graph": "/sda7docker",
"default-runtime": "runc",
"runtimes": {
"kata-runtime": {
"path": "/usr/bin/kata-runtime"
},
"crun": {
"path": "/usr/local/bin/crun"
},
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc",
"userns-remap": "sysbox"
}
},
"bip": "172.20.0.1/16",
"default-address-pools": [
{
"base": "172.25.0.0/16",
"size": 24
}
]
}
How should I configure this to have userns-remap only for sysbox-runc? Or can userns-remap be disabled at all? (It seems there is no shiftfs in debian although it is mentioned in package fuse-overlayfs.)
Hi @mviereck,
Thanks for giving Sysbox a shot and for adding support in x11docker for it, that would be AMAZING (you've done some great work with x11docker!).
Regarding your question:
How should I configure this to have userns-remap only for sysbox-runc?
Unfortunately Docker's userns-remap is a global setting for Docker, so there is no way to enable it just for Sysbox.
Or can userns-remap be disabled at all? (It seems there is no shiftfs in debian)
Yes, ideally you would not need to configure Docker in userns-remap to use Sysbox. But currently, for this to work the shiftfs kernel module must be installed.
In Debian, shiftfs is not included in the kernel, but can be easily installed from here. For example, to install it on a 5.8 or 5.10 kernel, you would do this:
$ git clone -b k5.10 https://github.com/toby63/shiftfs-dkms.git shiftfs-k510
$ cd shiftfs-k10
$ ./update1
$ sudo make -f Makefile.dkms
$ modinfo shiftfs
By the way, in Ubuntu shiftfs is usually included in the kernel (at least in Ubuntu desktop & server editions).
Once shiftfs is installed, you can remove the userns-remap config in /etc/docker/daemon.json and start launching containers with Docker + Sysbox with docker run --runtime=sysbox-runc ....
Hope that helps!
I forgot to mention: we are currently working on removing the need for shiftfs in Sysbox, but this will require Linux kernel >= 5.12, as that introduces a feature called ID-mapped mounts that in essence replaces shiftfs. We hope to have this by Feb'22.
Thanks for giving Sysbox a shot and for adding support in x11docker for it, that would be AMAZING (you've done some great work with x11docker!).
Thank you! :-)
Thanks to your help I could install shiftfs and can use my regular docker images now.
The current beta/master version of x11docker supports sysbox with --runtime=sysbox-runc (or short --runtime=sysbox).
Due to user namespacing several host integration features currently do not work, others are limited.
All current limitations are caused by mismatching file ownerships if sharing files between host and container. (including devices, unix sockets and fifos). My hope was that shiftfs would fix that.
I hope that including the feature of ID-mapped mounts will fix that. Debian provides Linux kernel 5.14.9 in its backports repository, so I could test that.
Missing features are:
-
option(Edit: fixed thanks to shiftfs)--hometo store persistent files on host (x11docker always runs fresh containers from image) - ALSA sound (
--alsa) (needs shared device files); however, pulseaudio sound over tcp works (--pulseaudio=tcp) -
cups printer support ((Edit: fixed with --printer=tcp)--printer) - GPU acceleration within current X server (
--gpu). However, GPU works with--xorg(new Xorg server) and indirect rendering (iGLX). - Webcam access (
--webcam) (needs shared device files) - Wayland (needs shared socket)
Is there a way to disable userns in sysbox to avoid above limitations?
As x11docker restricts the container user, user namespacing is not essential for security. Basically x11docker runs with low privileges setting --cap-drop=ALL --security-opt=no-new-privileges --user=$(id -u):$(id -g).
Examples that work yet:
x11docker --runtime=sysbox --network --desktop --home x11docker/xfce
x11docker --runtime=sysbox --network --gpu --xorg x11docker/check glxspheres64
x11docker --runtime=sysbox --network --pulseaudio x11docker/check
x11docker --runtime=sysbox --network --init=systemd x11docker/check
Hi @mviereck,
That's great, thanks for the update. Responding to some of your questions:
Is there a way to disable userns in sysbox to avoid above limitations?
No; we decided that user-ns was a "must-have" for Sysbox as a way hardening container isolation: we wanted a powerful root inside the container that would be mapped to a fully unprivileged user at host level (with a user-ID above the normal range of 0 to 65535).
All current limitations are caused by mismatching file ownerships if sharing files between host and container. (including devices, unix sockets and fifos). My hope was that shiftfs would fix that.
Shiftfs works well on regular files, but has some limitations on special files unfortunately. ID-mapped mounts should resolve that, but we need to test it. Another alternative (in case ID-mapped mounts don't work) is for Sysbox to give the root user in the container access to the special files via Access Control Lists (ACLs) while the container is running.
I hope that including the feature of ID-mapped mounts will fix that. Debian provides Linux kernel 5.14.9 in its backports repository, so I could test that.
Yes, that should be the case; as soon as I have an early version of ID-mapped mount support (within a couple of weeks I hope), I'll contact you so you can try it.
GPU acceleration within current X server (--gpu); However, GPU works with --xorg (new Xorg server) and indirect rendering (iGLX).
Very interesting; can you expand a bit on why "--gpu" does not work but "--xorg" and iGLX do?
Thanks again!
Hi @ctalledo ,
No; we decided that user-ns was a "must-have" for Sysbox as a way hardening container isolation: we wanted a powerful root inside the container that would be mapped to a fully unprivileged user at host level
Ok, I see. That makes sense if mainly root in container is targeted.
Shiftfs works well on regular files,
Indeed! I should have checked that closer. So option --home to share files with the host is no problem. I've fixed that in x11docker.
Another alternative (in case ID-mapped mounts don't work) is for Sysbox to give the root user in the container access to the special files via Access Control Lists (ACLs) while the container is running.
An interesting attempt! Sysbox would do the ACL setup itself? A likely pitfall is to cleanly remove the ACLs once the container stops, e.g. if the system shuts down suddenly.
Very interesting; can you expand a bit on why "--gpu" does not work but "--xorg" and iGLX do?
A regular setup for --gpu involves sharing the X unix socket in /tmp/.X11-unix and sharing the GPU device files in /dev/dri.
With sysbox I encounter two issues for this setup:
- Surprisingly (thanks to shiftfs?) the X unix socket shows correct ownership. For unknown reasons, accessing it fails nonetheless. Disabling X authentication does not help, so the issue is elsewhere.
- To solve this, x11docker runs a new X server (either Xorg or a nested one like Xephyr) with
-listen tcpthat allows X access over TCP network. Regular Xorg setups disable this for security concerns.
- To solve this, x11docker runs a new X server (either Xorg or a nested one like Xephyr) with
- The GPU device files show owner
nobody:nogroupso they cannot be accessed by members of groupsvideoandrender. This is required for direct rendering.- Xorg provides option
+iglxto allow indirect rendering. This has been broken a long time and is fixed since Xorg v1.20.8. It is disabled by default in regular Xorg setups for some security concerns. However, x11docker can enable it along with option--xorgif sharing GPU device files is not possible.
- Xorg provides option
Drawbacks of X over TCP and iGLX are a loss in performance, beside the security concerns by the Xorg developers.
Aside from that, with X over TCP only iGLX can be used for hardware acceleration. So both issues need to be fixed to allow direct rendering.
Only Xorg allows hardware acceleration with combined options -listen tcp +iglx. With direct rendering x11docker has a few more possibilities.
To reproduce the issue accessing the X unix socket (assuming DISPLAY = :0):
xhost + # disable X authentication
# share X unix socket, set DISPLAY and run a GUI
docker run --rm --runtime=sysbox-runc --volume /tmp/.X11-unix/X0:/tmp/.X11-unix/X0 --env DISPLAY=$DISPLAY \
x11docker/check xfce4-terminal
# check file ownership of /tmp/.X11-unix
docker run --rm --runtime=sysbox-runc --volume /tmp/.X11-unix/X0:/tmp/.X11-unix/X0 --env DISPLAY=$DISPLAY \
x11docker/check ls -la /tmp/.X11-unix
xhost - # enable X authentication
This works with --runtime=runc but fails with --runtime=sysbox-runc. I suspect a shiftfs issue.
Edit: just found that there already is a deep looking discussion in #272 .
At this point it might make sense to wait for integration of ID-mapped mounts in sysbox whether that fixes the issues with sockets, device files and fifo files. I am happy to check the possibilities again once you have integrated it.
Hi @mviereck,
Thanks for all that info, super useful.
Sysbox would do the ACL setup itself? A likely pitfall is to cleanly remove the ACLs once the container stops, e.g. if the system shuts down suddenly.
Yes, if we were to use the ACL solution, Sysbox would need to take care of the cleanup during container stop or when Sysbox gets a SIGTERM.
Surprisingly (thanks to shiftfs?) the X unix socket shows correct ownership. For unknown reasons, accessing it fails nonetheless.
Yes, very likely due to issue #272.
The GPU device files show owner nobody:nogroup so they cannot be accessed by members of groups video and render
Yes, this is something we hope ID-mapped mounts will also fix.
At this point it might make sense to wait for integration of ID-mapped mounts in sysbox whether that fixes the issues with sockets, device files and fifo files. I am happy to check the possibilities again once you have integrated it.
Yes, makes sense. Let me work on this and I'll ping you once I have something ready for testing.
Thanks again for all the help!
Hi @mviereck, regarding:
At this point it might make sense to wait for integration of ID-mapped mounts in sysbox whether that fixes the issues with sockets, device files and fifo files. I am happy to check the possibilities again once you have integrated it.
I have an early preview binary for Sysbox with ID-mapped mount support. It will fix the issue with the mounting of sockets and FIFO files, but not with device files unfortunately (it's a limitation of ID-mapped mounts at this time).
Would you be interested in giving this a try? If so, please join the Sysbox slack channel and we can coordinate there. Thanks!
Those are good news! Of course, I am interested to test this. If mounting devices will become possible in future, too, than features like direct rendering on GPU or webcam access will be possible with Sysbox.
The possibility to share unix sockets allows an unusual way for hardware acceleration with virgl that allows GPU access through a unix socket. However, this setup is experimental and the code of virgl_test_server used for this was not meant for production use and is not as performant yet as one could wish.
Compare:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/256
https://github.com/Xpra-org/xpra/issues/3452
I have tested the current master version / upcoming 0.5.0 release of Sysbox. All issues except sharing device files are solved now with the implementation of id-mapped mounts (needs kernel version >=5.12).
Related x11docker features that work now:
- Sharing X unix sockets allowing several different X servers.
- pulseaudio sound over tcp (not possible through shared socket due to user namespacing, but the tcp solution is fine)
- printer support by sharing a CUPS socket
What does not work yet due to missing device support by id-mapped mounts:
- webcam access
- sound through ALSA
GPU access works for a subset of applications that circumvent the user/group setting of dev/dri/* but not in general.
So far, the ticket can be closed if you like. Device mapping is a task on its own.
GPU access does not work in general by sharing devices for direct rendering, but there are two other possibilities:
- indirect rendering with iGLX / x11docker option
--gpu=iglx. Works with an additional Xorg only that needs its own tty. (x11docker option--xorg). - GPU access with
virgl_test_server/ x11docker option--gpu=virgl. This allows to accelerate even X servers that would not support GPU access otherwise.
If you would like to encourage the developers of virgl to improve their standalone virgl server to support containers, that would be great. I've made a start in https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/256
This would be of interest even without x11docker. For example, one could run Xvfb in a Sysbox container and use GPU acceleration.
Currently virgl_test_server already works, but needs development to improve stability and performance.