Document running nested in docker/podman
# podman run --rm -ti --security-opt seccomp=unconfined quay.io/cgwalters/coreos-assembler bwrap --unshare-pid --unshare-user --bind / / true
bwrap: Failed to mount tmpfs: Permission denied
This is actually SELinux. See this issue.
Now, this will work:
podman run --rm -ti --security-opt label=type:spc_t --security-opt seccomp=unconfined quay.io/cgwalters/coreos-assembler bwrap --unshare-pid --unshare-user --bind / / true
Note if one wants to pass through devices (e.g. --device /dev/kvm on the docker/podman side) you'll also want --dev-bind /dev /dev.
Now the problem I'm hitting is around /proc. Which if one is using --unshare-pid, you really need to do, or all of the PIDs are wrong in /proc and things will get confused.
Adding --proc /proc gets me:
bwrap: Can't mount proc on /newroot/proc: Operation not permitted
Which...I'm confused by this right now; why doesn't that work? It looks like our test suite does --bind /proc proc but that gets me the same issue with incorrect pids.
/cc @giuseppe
the issue is that /proc in the container has masked/readonly paths, that prevents an user namespace to mount a too "revealing" procfs.
The solution is either to not have these masked/readonly paths or avoid creating a new PID namespace in the container so that a bind mount works fine. Docker has recently added a way to modify the list of masked/readonly paths (at least in the API, not sure about the CLI), but for Podman I think --privileged the only way to skip adding these paths.
the issue is that /proc in the container has masked/readonly paths,
Ahh, right. Ugh. It feels like what we need is a "procfs-nolegacy" filesystem type or something that strips out all of the /proc/asound, /proc/bus, /proc/sysrq-trigger etc.
Alternatively...audit everything in /proc and verify that it requires CAP_SYS_ADMIN to write. If the container then doesn't have CAP_SYS_ADMIN, we don't need the ro overmounts.
Or yet another approach: An option for proc like ro-rw-pidfs that defaults every file to mode 0600 except what I'd call "pidfs", i.e. the process bits in /proc/$pid. And the semantics for this should include that CAP_DAC_OVERRIDE does not allow overriding permissions.
Ah right: https://lkml.org/lkml/2018/5/11/155
I have added container_userns_t which allows some of the access that was denied by container_t. Like mounting of a tmpfs. I would love to know if this would work for your use case, and leave SELinux in enforcing container separation.
I would love to know if this would work for your use case, and leave SELinux in enforcing container separation.
Probably, I'll try it at some point, but the real blocker here is the /proc issue.
Hi all, I came across this old issue - what is the current recommended way how to run bwrap inside unprivileged podman container? For me --dev /dev fails to mount /newroot/dev/pts when the podman container is started without --security-opt label=disable, and --proc /proc fails no matter what I try.
I am trying to do the following:
user@fedora42$ podman run --rm -it --security-opt label=disable fedora
root@container# yum -y install bwrap util-linux
root@container# adduser testuser
root@container# su - testuser
testuser@container$ bwrap --unshare-all --ro-bind / / --dev /dev --proc /proc ls /proc
bwrap: Can't mount proc on /newroot/proc: Operation not permitted
A command-line option similar to --dev /dev/ but without /dev/pts would be nice to have for wrapping things that do not need /dev/pts, but need for example /dev/null.
Thanks!
-Yenya
--proc /proc fails no matter what I try.
Try with --security-opt unmask=ALL
A command-line option similar to --dev /dev/ but without /dev/pts would be nice to have for wrapping things that do not need /dev/pts, but need for example /dev/null.
--tmpfs /dev
--dev-bind /dev/null /dev/null
Thanks for fast reply!
Binding only /dev/null works for me, nice. --security-opt unmask=ALL as well, but -- is it possible to add this to .config/containers/containers.conf or to the .container quadlet unit file? I tried the following in containers.conf, but it did not work:
[Security]
unmask=["ALL"]
-Yenya
Quadlet: documention says Unmask=ALL and there is PodmanArgs=.
Thanks!