podman Pasta networking is not supported for rootless containers created by root with --userns=auto

Issue Description

Pasta networking is not supported for rootless containers created by root with --userns=auto

Steps to reproduce the issue

sudo su
(root) podman run --rm -it --userns=auto --network pasta alpine

Describe the results you received

Error: invalid config provided: pasta networking is only supported for rootless mode

Describe the results you expected

Rootless container is created with pasta networking

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.6-3.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: '
  cpuUtilization:
    idlePercent: 63.62
    systemPercent: 9.68
    userPercent: 26.69
  cpus: 16
  distribution:
    distribution: fedora
    variant: coreos
    version: "37"
  eventLogger: journald
  hostname: ip-10-1-44-86
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.15-200.fc37.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 30585528320
  memTotal: 133530497024
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.1-1.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.1
      commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 0h 57m 40.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 549150765056
  graphRootUsed: 5631844352
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.2
  Built: 1677669779
  BuiltTime: Wed Mar  1 11:22:59 2023
  GitCommit: ""
  GoVersion: go1.19.6
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.2

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Mar 17 '23 21:03 lukasmrtvy

Hi @lukasmrtvy, thanks for reporting this!

Error: invalid config provided: pasta networking is only supported for rootless mode

In some sense this is intended: pasta won't run as root because that would unnecessarily broaden privileges that can be used after exploiting an attack vector, and I didn't expect it to be particularly useful if Podman is anyway started as root.

It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.

However, I guess it might be useful regardless of security considerations if you want a particular network configuration or if you just have some other reasons to run Podman as root for the moment.

It would be interesting if you could share your use case. Regardless of that, yes, I would still consider it as a missing feature.

Cc: @Luap99

Mar 18 '23 08:03 sbrivio-rh

What is the use case here? As root it seems much more preferable to just use the kernel networking tools (bridge + veth pair or macvlan) as those should be much more performant.

Mar 18 '23 12:03 Luap99

As root it seems much more preferable to just use the kernel networking tools (bridge + veth pair or macvlan)

Well, one might still want network isolation (against spoofing, packet forging, etc.), and throughput is usually higher for local port forwarding compared to building frames for veth or macvlan. But I'm also really curious to hear the use case here. :)

Mar 18 '23 19:03 sbrivio-rh

I need to control nftables for rootless containers and that's not possible. It works ok in rootful with --userns=auto arg and oci hook containing iptables, although it's not that secure as a standalone rootless container.

@sbrivio-rh mentioned https://superuser.com/questions/1277697/making-routing-decisions-based-on-uid-using-nftables, but not sure if this would work.

Rootless container with slirp4netns / pasta with --userns=auto and custom iptables would be my use case ( DENYing outbound traffic to RFC1918 ranges ).

Mar 20 '23 14:03 lukasmrtvy

@Luap99, you can assign this one to me, unless you plan to work on it as part of anything else you have pending.

Mar 21 '23 11:03 sbrivio-rh

A friendly reminder that this issue had no activity for 30 days.

Apr 21 '23 00:04 github-actions[bot]

@sbrivio-rh Any progress?

Apr 21 '23 00:04 rhatdan

@sbrivio-rh Any progress?

No, sorry, not yet. It's a quite a low priority item on my list (but we're talking about weeks, not months).

Apr 21 '23 04:04 sbrivio-rh

A friendly reminder that this issue had no activity for 30 days.

May 22 '23 00:05 github-actions[bot]

cc @dgibson

May 22 '23 18:05 Luap99

I need to control nftables for rootless containers and that's not possible. It works ok in rootful with --userns=auto arg and oci hook containing iptables, although it's not that secure as a standalone rootless container.

@lukasmrtvy I'm trying to understand this requirement a bit better. I'm assuming what you're doing here is modifying nftables rules in the host which will affect packets flowing to or from your container. Is that correct?

What exactly do those rules look like? Just being able to invoke pasta when root may not be enough here. Because pasta is forwarding traffic at L4, rather than L2, the rules you'd need to match them in the host may well be different from those you'd need for bridge based networking, and I doubt that's something we could practically address in podman or pasta.

Jun 28 '23 06:06 dgibson

We are hitting the pasta-blocked-as-root issue in a nested container scenario: a rootless podman instance needs[1] to run sub-containers with rootful podman, and in some situations these sub-containers need network isolation. Previously we have used slirp4netns (which somewhat confusingly is not blocked) but that has some reliability issues so we're looking to switch to pasta in this setup.

As a workaround, commenting out the rootless check for pasta seems to work—the attack surface issue is not super important here as the requirement is mainly a glorified chroot.

[1] Technically the top-level container container could run its sub-containers as a non-root user, but it's running a somewhat exotic CI agent that makes this even more painful than maintaining a patched podman binary.

Sep 11 '24 10:09 vuori

@vuori I'm a little surprising by this case. pasta can run as "root" (mapped UID 0) within a namespace / container - it at least attempts to only prevent running as "real", unmapped root. What does your exact stack of nested containers look like?

Sep 12 '24 01:09 dgibson

pasta itself runs fine, it's just that pkg/specgen/namespaces.go:validateNetNS actively prevents --network pasta when podman is not running in rootless mode. In this case podman will think it runs as root, since the top-level container is rootless but the second-level container's podman is started as the top-level container's uid 0.

Sep 12 '24 06:09 vuori

pasta itself runs fine, it's just that pkg/specgen/namespaces.go:validateNetNS actively prevents --network pasta when podman is not running in rootless mode. In this case podman will think it runs as root, since the top-level container is rootless but the second-level container's podman is started as the top-level container's uid 0.

Ah, that makes sense. @Luap99 any thoughts?

Sep 12 '24 07:09 dgibson

pasta is launched from the podman context not from the container context as such the userns is entirely ignored and doesn't chnage anything compared to a container with --userns. Sure we could try to make that work but then it will still not work for non userns root containers.

So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.

Sep 13 '24 17:09 Luap99

pasta is launched from the podman context not from the container context as such the userns is entirely ignored and doesn't chnage anything compared to a container with --userns. Sure we could try to make that work but then it will still not work for non userns root containers.

Ok, but for the nested containers, described here, the inner podman's context should already be in the userns established by the outer podman. So I'd still expect pasta to be invoked inside a userns, and therefore run, even as UID 0.

So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.

This is intended to stop the user from shooting themselves in the foot by running pasta privileged - and therefore compromising the security and isolation that pasta is intended to give.

Sep 16 '24 03:09 dgibson

Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.

slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?

Sep 16 '24 07:09 vuori

Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.

I haven't looked into this recently, but the main reason why I added that check is that, back then, pasta wouldn't work when Podman started it as root. Fixes such as this one and possibly more were needed.

Now it should work... and it works, as you reported. By the way, from https://github.com/containers/podman/issues/17840#issuecomment-1474777648:

It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.

But now I guess we can drop the check in validateNetNS().

slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?

No, not really. It will run as nobody, meaning that captures and log files are readable by any user who can switch to nobody, but those are intended for debugging. The same user can't attach with ptrace() anyway (that needs root), so everything should be taken care of, in that sense.

Sep 16 '24 07:09 sbrivio-rh

Yeah, to re-iterate, the block is completely on podman side and if the root check in podman code is removed pasta itself works fine. The comments indicate that this is some kind of security footgun prevention.

I haven't looked into this recently, but the main reason why I added that check is that, back then, pasta wouldn't work when Podman started it as root. Fixes such as this one and possibly more were needed.

Now it should work... and it works, as you reported. By the way, from #17840 (comment):

It was a bit tricky (albeit doable) to safely handle the switch to a fallback unprivileged user when started by Podman, so we skipped that in the initial integration.

But now I guess we can drop the check in validateNetNS().

Except it doesn't work when running as real root, it only works when already inside a nested userns.

$ sudo podman run --network pasta quay.io/libpod/testimage:20240123 ip a
Error: pasta failed with exit code 1:
Started as root, will change to nobody.
Couldn't switch to pasta namespaces: Operation not permitted

This is what I mean by pasta is refusing to run as root and this isn't something podman can fix. And yes I rather check this in podman and return a useful error to users rather than the pasta error which most users have no idea about why this happens. But sure we can fix the validate check in podman to correctly check for in userns and not is uid 0.

slirp4netns is allowed by podman in privileged mode, does pasta have somehow different security properties when being run as root?

No, not really. It will run as nobody, meaning that captures and log files are readable by any user who can switch to nobody, but those are intended for debugging. The same user can't attach with ptrace() anyway (that needs root), so everything should be taken care of, in that sense.

So the better question is why does pasta refuses to run as root and drops privileges before opening the netns path/configuring the interfaces? If pasta would not switch to nobody it should just work.

This is intended to stop the user from shooting themselves in the foot by running pasta privileged - and therefore compromising the security and isolation that pasta is intended to give.

Sure but switching to nobody also means running pasta as root is completely non functional as you cannot ever pass a netns owned by root. I run into the same issue elsewhere, https://github.com/containers/aardvark-dns/issues/499, just in tests so I do not care that much about the security concerns in that case. I see no reason why pasta should actual refuse to work in these cases, the alternative is to never use pasta as root and stick to bridge/veth pair networking but as pointed out in the original report there are actual users wanting to use pasta even as root. Given pasta uses a lot of different sandboxing mechanisms such as seccomp, selinux/apparmor, mounting an empty rootfs, etc... I don't think dropping the user is that big of a deal. Of course the one downside is that pasta has to keep the real CAP_NET_ADMIN (and some other caps) from the init (host) userns which means it can modify the init (host) netns if there is actual a bug.

Sep 16 '24 09:09 Luap99

Except it doesn't work when running as real root, it only works when already inside a nested userns.

Ouch, sorry, you're right. And this is actually what this ticket was about, originally.

I see no reason why pasta should actual refuse to work in these cases, the alternative is to never use pasta as root and stick to bridge/veth pair networking

Right, and it's a bit silly to force users to use "root" networking because we want to avoid running pasta as root. It makes no sense in terms of overall security.

Given pasta uses a lot of different sandboxing mechanisms such as seccomp, selinux/apparmor, mounting an empty rootfs, etc... I don't think dropping the user is that big of a deal. Of course the one downside is that pasta has to keep the real CAP_NET_ADMIN (and some other caps) from the init (host) userns which means it can modify the init (host) netns if there is actual a bug.

We can try something else though (as you perhaps were suggesting in https://github.com/containers/podman/issues/17840#issuecomment-2349447717): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.

I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.

In the short term, I still think that your https://github.com/containers/podman/pull/23961/ makes sense nevertheless (it might take me a while to make that change in pasta).

Sep 16 '24 15:09 sbrivio-rh

We can try something else though (as you perhaps were suggesting in https://github.com/containers/podman/issues/17840#issuecomment-2349447717): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.

I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.

I haven't looked how you do in pasta today but I guess one reason for the user is to drop all capabilities? In this case the thing to consider is that you would need to keep some caps, i.e. CAP_NET_BIND_SERVICE when using auto port forward to allow binding < 1024. And AFAIK you have been talking about a netlink monitor mode to update the addresses in the namespace during runtime so then you would need to keep CAP_NET_ADMIN as well. Of course we can switch users and keep caps but it is a bit more work to do so I think.

Sep 16 '24 16:09 Luap99

But now I guess we can drop the check in validateNetNS().

Except it doesn't work when running as real root, it only works when already inside a nested userns.

Ah. yes. I was thinking specifically of @vuori's situation, not the one originally described by @lukasmrtvy.

Sep 17 '24 01:09 dgibson

The PR makes pasta work at least in my nested scenario. Thanks for the fix.

Sep 17 '24 08:09 vuori

We can try something else though (as you perhaps were suggesting in #17840 (comment)): we could defer switching to nobody until we're done setting up the namespace. We already moved that part recently to ensure compatibility with libguestfs, and we can postpone it a bit further.

I think what's most relevant is that we avoid running the main loop as root, but the setup phase isn't really affected by untrusted input. I'll work on that and report back.

I haven't looked how you do in pasta today but I guess one reason for the user is to drop all capabilities?

We also drop most capabilities explicitly if present, but yes, that's the reason.

However, we can't rely on capabilities alone to restrict our privileges because 1. there are still a number of checks in the kernel relying on the user having UID 0 (at least in a given namespace) instead of looking at actual capabilities and 2. if we run as root, we can open root-accessible files. We remount the root filesystem from an empty one, but if something goes wrong with that, it's nice to avoid running as root.

In this case the thing to consider is that you would need to keep some caps, i.e. CAP_NET_BIND_SERVICE when using auto port forward to allow binding < 1024.

Right, we keep it if given, both outside and inside the target namespace.

And AFAIK you have been talking about a netlink monitor mode to update the addresses in the namespace during runtime so then you would need to keep CAP_NET_ADMIN as well.

Yes, I'm working on that, and yes, we already keep CAP_NET_ADMIN in the target namespace.

Of course we can switch users and keep caps but it is a bit more work to do so I think.

No no it's all there already. But for the original issue reported here, we need to drop root a bit later so that we can join the target namespace when we start.

Sep 17 '24 14:09 sbrivio-rh

But the outside/inside logic only applies when a userns is created because caps are per user namespace only. If we (podman) hand pasta a netns created in the init (host) netns you cannot create a new userns as this would automatically drop all CAPS in parent userns and we have no caps for the init namesapces at all anymore. The new CAP_NET_ADMIN in the userns does not allow you to modify the netns so this wouldn't work anymore which is what I meant. Thus I think there is no way to isolate things down via a userns when you get a netns form the init namespace.

Sep 17 '24 15:09 Luap99

podman podman copied to clipboard

Pasta networking is not supported for rootless containers created by root with --userns=auto

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

podman
podman copied to clipboard