podman
podman copied to clipboard
Pasta networking is not supported for rootless containers created by root with --userns=auto
Issue Description
Pasta networking is not supported for rootless containers created by root with --userns=auto
Steps to reproduce the issue
Steps to reproduce the issue
sudo su- (root)
podman run --rm -it --userns=auto --network pasta alpine
Describe the results you received
Error: invalid config provided: pasta networking is only supported for rootless mode
Describe the results you expected
Rootless container is created with pasta networking
podman info output
host:
arch: amd64
buildahVersion: 1.29.0
cgroupControllers:
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.6-3.fc37.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.6, commit: '
cpuUtilization:
idlePercent: 63.62
systemPercent: 9.68
userPercent: 26.69
cpus: 16
distribution:
distribution: fedora
variant: coreos
version: "37"
eventLogger: journald
hostname: ip-10-1-44-86
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 6.1.15-200.fc37.x86_64
linkmode: dynamic
logDriver: journald
memFree: 30585528320
memTotal: 133530497024
networkBackend: netavark
ociRuntime:
name: crun
package: crun-1.8.1-1.fc37.x86_64
path: /usr/bin/crun
version: |-
crun version 1.8.1
commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-8.fc37.x86_64
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.3
swapFree: 0
swapTotal: 0
uptime: 0h 57m 40.00s
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/home/core/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/core/.local/share/containers/storage
graphRootAllocated: 549150765056
graphRootUsed: 5631844352
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 0
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
APIVersion: 4.4.2
Built: 1677669779
BuiltTime: Wed Mar 1 11:22:59 2023
GitCommit: ""
GoVersion: go1.19.6
Os: linux
OsArch: linux/amd64
Version: 4.4.2
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Hi @lukasmrtvy, thanks for reporting this!
Error: invalid config provided: pasta networking is only supported for rootless mode
In some sense this is intended: pasta won't run as root because that would unnecessarily broaden privileges that can be used after exploiting an attack vector, and I didn't expect it to be particularly useful if Podman is anyway started as root.
It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.
However, I guess it might be useful regardless of security considerations if you want a particular network configuration or if you just have some other reasons to run Podman as root for the moment.
It would be interesting if you could share your use case. Regardless of that, yes, I would still consider it as a missing feature.
Cc: @Luap99
What is the use case here? As root it seems much more preferable to just use the kernel networking tools (bridge + veth pair or macvlan) as those should be much more performant.
As root it seems much more preferable to just use the kernel networking tools (bridge + veth pair or macvlan)
Well, one might still want network isolation (against spoofing, packet forging, etc.), and throughput is usually higher for local port forwarding compared to building frames for veth or macvlan. But I'm also really curious to hear the use case here. :)
I need to control nftables for rootless containers and that's not possible. It works ok in rootful with --userns=auto arg and oci hook containing iptables, although it's not that secure as a standalone rootless container.
@sbrivio-rh mentioned https://superuser.com/questions/1277697/making-routing-decisions-based-on-uid-using-nftables, but not sure if this would work.
Rootless container with slirp4netns / pasta with --userns=auto and custom iptables would be my use case ( DENYing outbound traffic to RFC1918 ranges ).
@Luap99, you can assign this one to me, unless you plan to work on it as part of anything else you have pending.
A friendly reminder that this issue had no activity for 30 days.
@sbrivio-rh Any progress?
@sbrivio-rh Any progress?
No, sorry, not yet. It's a quite a low priority item on my list (but we're talking about weeks, not months).
A friendly reminder that this issue had no activity for 30 days.
cc @dgibson
I need to control nftables for rootless containers and that's not possible. It works ok in rootful with
--userns=autoarg and oci hook containing iptables, although it's not that secure as a standalone rootless container.
@lukasmrtvy I'm trying to understand this requirement a bit better. I'm assuming what you're doing here is modifying nftables rules in the host which will affect packets flowing to or from your container. Is that correct?
What exactly do those rules look like? Just being able to invoke pasta when root may not be enough here. Because pasta is forwarding traffic at L4, rather than L2, the rules you'd need to match them in the host may well be different from those you'd need for bridge based networking, and I doubt that's something we could practically address in podman or pasta.
We are hitting the pasta-blocked-as-root issue in a nested container scenario: a rootless podman instance needs[1] to run sub-containers with rootful podman, and in some situations these sub-containers need network isolation. Previously we have used slirp4netns (which somewhat confusingly is not blocked) but that has some reliability issues so we're looking to switch to pasta in this setup.
As a workaround, commenting out the rootless check for pasta seems to work—the attack surface issue is not super important here as the requirement is mainly a glorified chroot.
[1] Technically the top-level container container could run its sub-containers as a non-root user, but it's running a somewhat exotic CI agent that makes this even more painful than maintaining a patched podman binary.
@vuori I'm a little surprising by this case. pasta can run as "root" (mapped UID 0) within a namespace / container - it at least attempts to only prevent running as "real", unmapped root. What does your exact stack of nested containers look like?
pasta itself runs fine, it's just that pkg/specgen/namespaces.go:validateNetNS actively prevents --network pasta when podman is not running in rootless mode. In this case podman will think it runs as root, since the top-level container is rootless but the second-level container's podman is started as the top-level container's uid 0.
pasta itself runs fine, it's just that pkg/specgen/namespaces.go:validateNetNS actively prevents
--network pastawhen podman is not running in rootless mode. In this case podman will think it runs as root, since the top-level container is rootless but the second-level container's podman is started as the top-level container's uid 0.
Ah, that makes sense. @Luap99 any thoughts?
pasta is launched from the podman context not from the container context as such the userns is entirely ignored and doesn't chnage anything compared to a container with --userns. Sure we could try to make that work but then it will still not work for non userns root containers.
So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.
pasta is launched from the podman context not from the container context as such the userns is entirely ignored and doesn't chnage anything compared to a container with --userns. Sure we could try to make that work but then it will still not work for non userns root containers.
Ok, but for the nested containers, described here, the inner podman's context should already be in the userns established by the outer podman. So I'd still expect pasta to be invoked inside a userns, and therefore run, even as UID 0.
So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.
This is intended to stop the user from shooting themselves in the foot by running pasta privileged - and therefore compromising the security and isolation that pasta is intended to give.
Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.
slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?
Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.
I haven't looked into this recently, but the main reason why I added that check is that, back then, pasta wouldn't work when Podman started it as root. Fixes such as this one and possibly more were needed.
Now it should work... and it works, as you reported. By the way, from https://github.com/containers/podman/issues/17840#issuecomment-1474777648:
It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.
But now I guess we can drop the check in validateNetNS().
slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?
No, not really. It will run as nobody, meaning that captures and log files are readable by any user who can switch to nobody, but those are intended for debugging. The same user can't attach with ptrace() anyway (that needs root), so everything should be taken care of, in that sense.
Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.
I haven't looked into this recently, but the main reason why I added that check is that, back then, pasta wouldn't work when Podman started it as root. Fixes such as this one and possibly more were needed.
Now it should work... and it works, as you reported. By the way, from #17840 (comment):
It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.
But now I guess we can drop the check in validateNetNS().
Except it doesn't work when running as real root, it only works when already inside a nested userns.
$ sudo podman run --network pasta quay.io/libpod/testimage:20240123 ip a
Error: pasta failed with exit code 1:
Started as root, will change to nobody.
Couldn't switch to pasta namespaces: Operation not permitted
This is what I mean by pasta is refusing to run as root and this isn't something podman can fix. And yes I rather check this in podman and return a useful error to users rather than the pasta error which most users have no idea about why this happens. But sure we can fix the validate check in podman to correctly check for in userns and not is uid 0.
slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?
No, not really. It will run as
nobody, meaning that captures and log files are readable by any user who can switch tonobody, but those are intended for debugging. The same user can't attach with ptrace() anyway (that needs root), so everything should be taken care of, in that sense.
So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.
This is intended to stop the user from shooting themselves in the foot by running pasta privileged - and therefore compromising the security and isolation that pasta is intended to give.
Sure but switching to nobody also means running pasta as root is completely non functional as you cannot ever pass a netns owned by root. I run into the same issue elsewhere, https://github.com/containers/aardvark-dns/issues/499, just in tests so I do not care that much about the security concerns in that case. I see no reason why pasta should actual refuse to work in these cases, the alternative is to never use pasta as root and stick to bridge/veth pair networking but as pointed out in the original report there are actual users wanting to use pasta even as root. Given pasta uses a lot of different sandboxing mechanisms such as seccomp, selinux/apparmor, mounting an empty rootfs, etc... I don't think dropping the user is that big of a deal. Of course the one downside is that pasta has to keep the real CAP_NET_ADMIN (and some other caps) from the init (host) userns which means it can modify the init (host) netns if there is actual a bug.
Except it doesn't work when running as real root, it only works when already inside a nested userns.
Ouch, sorry, you're right. And this is actually what this ticket was about, originally.
I see no reason why pasta should actual refuse to work in these cases, the alternative is to never use pasta as root and stick to bridge/veth pair networking
Right, and it's a bit silly to force users to use "root" networking because we want to avoid running pasta as root. It makes no sense in terms of overall security.
Given pasta uses a lot of different sandboxing mechanisms such as seccomp, selinux/apparmor, mounting an empty rootfs, etc... I don't think dropping the user is that big of a deal. Of course the one downside is that pasta has to keep the real CAP_NET_ADMIN (and some other caps) from the init (host) userns which means it can modify the init (host) netns if there is actual a bug.
We can try something else though (as you perhaps were suggesting in https://github.com/containers/podman/issues/17840#issuecomment-2349447717): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.
I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.
In the short term, I still think that your https://github.com/containers/podman/pull/23961/ makes sense nevertheless (it might take me a while to make that change in pasta).
We can try something else though (as you perhaps were suggesting in https://github.com/containers/podman/issues/17840#issuecomment-2349447717): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.
I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.
I haven't looked how you do in pasta today but I guess one reason for the user is to drop all capabilities? In this case the thing to consider is that you would need to keep some caps, i.e. CAP_NET_BIND_SERVICE when using auto port forward to allow binding < 1024. And AFAIK you have been talking about a netlink monitor mode to update the addresses in the namespace during runtime so then you would need to keep CAP_NET_ADMIN as well. Of course we can switch users and keep caps but it is a bit more work to do so I think.
But now I guess we can drop the check in validateNetNS().
Except it doesn't work when running as real root, it only works when already inside a nested userns.
Ah. yes. I was thinking specifically of @vuori's situation, not the one originally described by @lukasmrtvy.
The PR makes pasta work at least in my nested scenario. Thanks for the fix.
We can try something else though (as you perhaps were suggesting in #17840 (comment)): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.
I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.
I haven't looked how you do in pasta today but I guess one reason for the user is to drop all capabilities?
We also drop most capabilities explicitly if present, but yes, that's the reason.
However, we can't rely on capabilities alone to restrict our privileges because 1. there are still a number of checks in the kernel relying on the user having UID 0 (at least in a given namespace) instead of looking at actual capabilities and 2. if we run as root, we can open root-accessible files. We remount the root filesystem from an empty one, but if something goes wrong with that, it's nice to avoid running as root.
In this case the thing to consider is that you would need to keep some caps, i.e. CAP_NET_BIND_SERVICE when using auto port forward to allow binding < 1024.
Right, we keep it if given, both outside and inside the target namespace.
And AFAIK you have been talking about a netlink monitor mode to update the addresses in the namespace during runtime so then you would need to keep CAP_NET_ADMIN as well.
Yes, I'm working on that, and yes, we already keep CAP_NET_ADMIN in the target namespace.
Of course we can switch users and keep caps but it is a bit more work to do so I think.
No no it's all there already. But for the original issue reported here, we need to drop root a bit later so that we can join the target namespace when we start.
But the outside/inside logic only applies when a userns is created because caps are per user namespace only. If we (podman) hand pasta a netns created in the init (host) netns you cannot create a new userns as this would automatically drop all CAPS in parent userns and we have no caps for the init namesapces at all anymore. The new CAP_NET_ADMIN in the userns does not allow you to modify the netns so this wouldn't work anymore which is what I meant. Thus I think there is no way to isolate things down via a userns when you get a netns form the init namespace.