gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

Error: OCI runtime error: runsc: creating container: systemd error: Interactive authentication required.

Open dseomn opened this issue 8 months ago • 12 comments

Description

When I try to use runsc with rootless podman, I get an error:

$ podman run --interactive --tty --rm --runtime=runsc debian:testing
Error: OCI runtime error: runsc: creating container: systemd error: Interactive authentication required.

With runc, I don't get the error:

$ podman run --interactive --tty --rm --runtime=runc debian:testing
root@bbdf83f8f377:/# 

I'm not sure if this is a bug in gvisor, podman, my setup, or something else. This seemed like a good place to start though since I couldn't find any reports of the same issue here or in podman.

Steps to reproduce

Run the command above as a non-root user.

runsc version

$ runsc -version
runsc version 0.0~20240729.0
spec: 1.2.0

docker version (if using docker)

It's not docker, but seems relevant:

$ podman --version
podman version 5.4.0

uname

Linux solaria 6.12.12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.12-1 (2025-02-02) x86_64 GNU/Linux

kubectl (if using Kubernetes)


repo state (if built from source)

No response

runsc debug logs (if available)

Here's the bottom of the create log. Let me know if more would be useful.

D0311 19:53:25.557605  2642085 container.go:200] Create container, cid: b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b, rootDir: "/run/user/1000/runsc"
D0311 19:53:25.557658  2642085 container.go:1797] Configuring container with a new userns with identity user mappings into current userns
D0311 19:53:25.557717  2642085 container.go:1853] UID Mappings:
D0311 19:53:25.557726  2642085 container.go:1855] 	Container ID: 0, Host ID: 0, Range Length: 1
D0311 19:53:25.557732  2642085 container.go:1855] 	Container ID: 1, Host ID: 1, Range Length: 65536
D0311 19:53:25.557769  2642085 container.go:1853] GID Mappings:
D0311 19:53:25.557777  2642085 container.go:1855] 	Container ID: 0, Host ID: 0, Range Length: 1
D0311 19:53:25.557781  2642085 container.go:1855] 	Container ID: 1, Host ID: 1, Range Length: 65536
D0311 19:53:25.557816  2642085 container.go:262] Creating new sandbox for container, cid: b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b
D0311 19:53:25.560717  2642085 cgroup.go:428] New cgroup for pid: self, *cgroup.cgroupSystemd: &{cgroupV2:{Mountpoint:/sys/fs/cgroup Path:/user.slice/libpod-b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b.scope Controllers:[cpuset cpu io memory hugetlb pids rdma misc] Own:[]} Name:b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b Parent:user.slice ScopePrefix:libpod properties:[] dbusConn:0xc000424280}
D0311 19:53:25.560739  2642085 systemd.go:98] Installing systemd cgroup resource controller under user.slice
D0311 19:53:25.560936  2642085 container.go:1036] Created filestore file at "/home/dseomn/.local/share/containers/storage/overlay/9cf5032d7db9dc90e050a258377367773f27dc3b742414a5dd3afbba14876693/merged/.gvisor.filestore.b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b" for mount source "/home/dseomn/.local/share/containers/storage/overlay/9cf5032d7db9dc90e050a258377367773f27dc3b742414a5dd3afbba14876693/merged"
D0311 19:53:25.560962  2642085 systemd.go:154] Joining systemd cgroup libpod-b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b.scope
D0311 19:53:25.568684  2642085 cgroup_v2.go:177] Deleting cgroup "/sys/fs/cgroup/user.slice/libpod-b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b.scope"
D0311 19:53:25.568720  2642085 container.go:790] Destroy container, cid: b157a14c7618504c34a0a4439d2379a8cc079c3953a5dabb8a9512fd81ca5e8b
W0311 19:53:25.568855  2642085 util.go:64] FATAL ERROR: creating container: systemd error: Interactive authentication required.
W0311 19:53:25.568892  2642085 main.go:231] Failure to execute command, err: 1

dseomn avatar Mar 11 '25 23:03 dseomn

i think it is duplicate of https://github.com/google/gvisor/issues/311

running as root should work here

sudo podman run --interactive --tty --rm --runtime=runsc debian:testing

milantracy avatar Mar 12 '25 07:03 milantracy

I saw that, but I thought that was supposed to be fixed as of 8e4cb261486ad84bc5657b1cee0288018f693d01, making this a regression?

dseomn avatar Mar 12 '25 17:03 dseomn

not really the command you shared requests systemd as cgroup manager.

the gvisor test script ignores cgroups at https://github.com/google/gvisor/blob/906fb319cc3afdd7ee8f6917a3a0636bcf7d1afd/test/podman/run.sh#L34

alternative to running as root, you can follow that test script run rootlessly by doing

$ mkdir /tmp/podman && cd "$_"
$ cat > runsc.podman <<EOF       
#!/bin/bash

exec /tmp/runsc/runsc --ignore-cgroups "\$@"                                          
EOF
$ chmod u+x runsc.podman
$ podman run --interactive --tty --rm --runtime /tmp/podman/runsc.podman debian:testing
root@c00655b0d8c6:/# dmesg
[    0.000000] Starting gVisor...
[    0.538988] Checking naughty and nice process list...
[    0.549468] Creating cloned children...
[    0.597381] Moving files to filing cabinet...
[    0.778743] Constructing home...
[    0.926395] Feeding the init monster...
[    1.188440] Conjuring /dev/null black hole...
[    1.597120] Digging up root...
[    1.907546] Generating random numbers by fair dice roll...
[    2.300496] Consulting tar man page...
[    2.795365] Reading process obituaries...
[    3.092411] Setting up VFS...
[    3.351798] Setting up FUSE...
[    3.490462] Ready!

milantracy avatar Mar 13 '25 03:03 milantracy

Thank you, that fixes that error! Should --ignore-cgroups be the default for rootless containers, so that it just works out of the box?

Btw, there's an (I think) easier way to pass flags: podman run --runtime=runsc --runtime-flag=ignore-cgroups

I did get another error after adding that flag:

$ podman run --interactive --tty --rm --runtime=runsc --runtime-flag=ignore-cgroups debian:testing 
starting container: setting up network: creating interfaces from net namespace "/proc/3194499/ns/net": cannot run with network enabled in root network namespace
                                                                                                                                                                Error: `/usr/bin/runsc --ignore-cgroups start d184db99d0d16877a9cb7bc63bb924c151dca61ce523b48c78ae04eaf8798414` failed: exit status 128

Adding --runtime-flag=network=none "fixed" it though. Should I file a separate feature request to get networking working in rootless containers? Or is that not feasible?

$ podman run --interactive --tty --rm --runtime=runsc --runtime-flag=ignore-cgroups --runtime-flag=network=none debian:testing 
root@ef4b4733bc02:/# 

dseomn avatar Mar 13 '25 04:03 dseomn

I just actually read that dmesg output and literally laughed out loud. Thank you for that, whoever thought to put that humor in the dmesg logs!

dseomn avatar Mar 13 '25 04:03 dseomn

Should --ignore-cgroups be the default for rootless containers, so that it just works out of the box?

Rootless mode works for me without this flag, so there must be some difference between our systems that make your user unable to set up cgroups without superuser privileges. I'm not sure what that would be (try running under strace to see which syscall specifically gets rejected).

Adding --runtime-flag=network=none "fixed" it though. Should I file a separate feature request to get networking working in rootless containers? Or is that not feasible?

gVisor's userspace network stack is known not to work in rootless containers; see #10359 for why. You can still use host network stack passthrough (--network=host), but this reduces the level of isolation gVisor gives you for network-related system calls.

I just actually read that dmesg output and literally laughed out loud. Thank you for that, whoever thought to put that humor in the dmesg logs!

Feel free to send a PR to add more to the rotation :)

EtiennePerot avatar Mar 13 '25 17:03 EtiennePerot

Rootless mode works for me without this flag, so there must be some difference between our systems that make your user unable to set up cgroups without superuser privileges. I'm not sure what that would be (try running under strace to see which syscall specifically gets rejected).

If I'm reading the strace correctly, I think the issue was on the other side of the /run/dbus/system_bus_socket socket, not something that would show up in an strace of podman or runsc:

3388800 recvmsg(12, {msg_name={sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, msg_namelen=112 => 30, msg_iov=[{iov_base="$\0\0\0Interactive authentication r"..., iov_len=41}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 41

Some lines from sudo dbus-monitor --system that look interesting:

method call time=1741891164.116532 sender=:1.21680 -> destination=org.freedesktop.systemd1 serial=2 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=StartTransientUnit
   string "libpod-903d0b959497f8ca5d1009fdd47d0b05dedf1c7c0332ddecccce1a4fdcec73bf.scope"
   string "replace"
   array [
      struct {
         string "Slice"
         variant             string "user.slice"
      }
      struct {
         string "Description"
         variant             string "Secure container 903d0b959497f8ca5d1009fdd47d0b05dedf1c7c0332ddecccce1a4fdcec73bf"
      }
      struct {
         string "PIDs"
         variant             array [
               uint32 3392470
            ]
      }
      struct {
         string "MemoryAccounting"
         variant             boolean true
      }
      struct {
         string "CPUAccounting"
         variant             boolean true
      }
      struct {
         string "TasksAccounting"
         variant             boolean true
      }
      struct {
         string "IOAccounting"
         variant             boolean true
      }
      struct {
         string "Delegate"
         variant             boolean true
      }
      struct {
         string "DefaultDependencies"
         variant             boolean false
      }
      struct {
         string "TasksMax"
         variant             uint64 2048
      }
   ]
   array [
   ]
...
method call time=1741891164.116687 sender=:1.21136 -> destination=org.freedesktop.PolicyKit1 serial=17062 path=/org/freedesktop/PolicyKit1/Authority; interface=org.freedesktop.PolicyKit1.Authority; member=CheckAuthorization
   struct {
      string "system-bus-name"
      array [
         dict entry(
            string "name"
            variant                string ":1.21680"
         )
      ]
   }
   string "org.freedesktop.systemd1.manage-units"
   array [
      dict entry(
         string "unit"
         string "libpod-903d0b959497f8ca5d1009fdd47d0b05dedf1c7c0332ddecccce1a4fdcec73bf.scope"
      )
      dict entry(
         string "verb"
         string "start"
      )
      dict entry(
         string "polkit.message"
         string "Authentication is required to start transient unit '$(unit)'."
      )
      dict entry(
         string "polkit.gettext_domain"
         string "systemd"
      )
   ]
   uint32 0
   string ""
...
method return time=1741891164.122599 sender=:1.9 -> destination=:1.21136 serial=6287 reply_serial=17062
   struct {
      boolean false
      boolean true
      array [
         dict entry(
            string "polkit.gettext_domain"
            string "systemd"
         )
         dict entry(
            string "polkit.retains_authorization_after_challenge"
            string "1"
         )
         dict entry(
            string "polkit.message"
            string "Authentication is required to start transient unit '$(unit)'."
         )
         dict entry(
            string "unit"
            string "libpod-903d0b959497f8ca5d1009fdd47d0b05dedf1c7c0332ddecccce1a4fdcec73bf.scope"
         )
         dict entry(
            string "verb"
            string "start"
         )
      ]
   }
error time=1741891164.122671 sender=:1.21136 -> destination=:1.21680 error_name=org.freedesktop.DBus.Error.InteractiveAuthorizationRequired reply_serial=2
   string "Interactive authentication required."

I think /org/freedesktop/PolicyKit1/Authority is the thing that actually denied the permission? From https://www.freedesktop.org/software/polkit/docs/latest/eggdbus-interface-org.freedesktop.PolicyKit1.Authority.html#eggdbus-method-org.freedesktop.PolicyKit1.Authority.CheckAuthorization I think the first two boolean ... lines in the method return ... reply_serial=17062 section are is_authorized and is_challenge.

So I'm guessing that there's some difference in /usr/share/polkit-1/rules.d/ between your system where it works and mine where it doesn't. I'm on Debian testing, what about you?

gVisor's userspace network stack is known not to work in rootless containers; see https://github.com/google/gvisor/issues/10359 for why. You can still use host network stack passthrough (--network=host), but this reduces the level of isolation gVisor gives you for network-related system calls.

Thanks for the pointer!

dseomn avatar Mar 13 '25 18:03 dseomn

Does this command show any matching rules on your system?

$ LC_ALL=C.UTF-8 sudo grep -r 'org\.freedesktop\.systemd1\.' /etc/polkit-1/rules.d /run/polkit-1/rules.d /usr/local/share/polkit-1/rules.d /usr/share/polkit-1/rules.d
grep: /run/polkit-1/rules.d: No such file or directory
grep: /usr/local/share/polkit-1/rules.d: No such file or directory

dseomn avatar Mar 13 '25 19:03 dseomn

I'm on Debian testing, what about you?

Corporate Google workstation, based on Debian testing but I would not be surprised if there were many invasive tweaks into polkit.

$ LC_ALL=C.UTF-8 sudo grep -r 'org\.freedesktop\.systemd1\.' /etc/polkit-1/rules.d /run/polkit-1/rules.d /usr/local/share/polkit-1/rules.d /usr/share/polkit-1/rules.d
grep: /run/polkit-1/rules.d: No such file or directory
grep: /usr/local/share/polkit-1/rules.d: No such file or directory

I see no attempt to use dbus to talk to anything (strace -ff runsc --rootless do echo hi 2>&1 | grep -i dbus returns empty).

This suggests that runsc gives up on cgroup setup if it gets EACCESS:

$ strace -ff runsc --debug=true --alsologtostderr --rootless do echo hi 2>&1 | grep -i cgroup
[pid 2213169] write(2, "D0313 17:18:15.870963  2213169 c"..., 93D0313 17:18:15.870963  2213169 config.go:456] Config.IgnoreCgroups (--ignore-cgroups): false
[pid 2213169] write(2, "D0313 17:18:15.870988  2213169 c"..., 93D0313 17:18:15.870988  2213169 config.go:456] Config.SystemdCgroup (--systemd-cgroup): false
[pid 2213175] write(2, "D0313 17:18:15.921426  2213175 c"..., 93D0313 17:18:15.921426  2213175 config.go:456] Config.IgnoreCgroups (--ignore-cgroups): false
[pid 2213175] write(2, "D0313 17:18:15.921447  2213175 c"..., 93D0313 17:18:15.921447  2213175 config.go:456] Config.SystemdCgroup (--systemd-cgroup): false
      "MEMORY_PRESSURE_WATCH=/sys/fs/cgroup/user.slice/user-210638.slice/[email protected]/session.slice/dbus.service/memory.pressure",
[pid 2213175] statfs("/sys/fs/cgroup", {f_type=CGROUP2_SUPER_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0xaac7401f, 0x894c16af]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV|ST_NOEXEC|ST_RELATIME}) = 0
[pid 2213175] openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY|O_CLOEXEC) = 11
[pid 2213175] write(2, "D0313 17:18:15.923539  2213175 c"..., 203D0313 17:18:15.923539  2213175 cgroup.go:428] New cgroup for pid: self, *cgroup.cgroupV2: &{Mountpoint:/sys/fs/cgroup Path:/runsc-060597 Controllers:[cpuset cpu io memory hugetlb pids rdma misc] Own:[]}
[pid 2213175] write(2, "D0313 17:18:15.923576  2213175 c"..., 102D0313 17:18:15.923576  2213175 cgroup_v2.go:132] Installing cgroup path "/sys/fs/cgroup/runsc-060597"
[pid 2213175] openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.subtree_control", O_WRONLY|O_TRUNC|O_CLOEXEC) = -1 EACCES (Permission denied)
[pid 2213175] write(2, "D0313 17:18:15.923622  2213175 c"..., 95D0313 17:18:15.923622  2213175 cgroup_v2.go:177] Deleting cgroup "/sys/fs/cgroup/runsc-060597"
[pid 2213175] write(2, "W0313 17:18:15.923667  2213175 c"..., 160W0313 17:18:15.923667  2213175 container.go:1770] Skipping cgroup configuration in rootless mode: open /sys/fs/cgroup/cgroup.subtree_control: permission denied
[...]

This is coming from here: https://github.com/google/gvisor/blob/b01944883bfc3c0a0fa56565c197b3612401f9bc/runsc/container/container.go#L1824-L1834

So I think this code needs to broaden the way it identifies error such that polkit-based failures are treated in a similar manner, effectively turning off cgroup configuration in this case.

EtiennePerot avatar Mar 14 '25 00:03 EtiennePerot

I see no attempt to use dbus to talk to anything (strace -ff runsc --rootless do echo hi 2>&1 | grep -i dbus returns empty).

I don't think this is relevant, but the strace line I shared before was from podman (following forks), I think.

This suggests that runsc gives up on cgroup setup if it gets EACCESS:

I see Config.SystemdCgroup (--systemd-cgroup): false in your logs, but in mine:

D0311 19:53:25.555206  2642085 config.go:436] Config.SystemdCgroup (--systemd-cgroup): true

I'm not sure about this, but I think it was systemd, not runsc, sending the dbus messages to polkit.

So I think this code needs to broaden the way it identifies error such that polkit-based failures are treated in a similar manner, effectively turning off cgroup configuration in this case.

If my understanding above is right, then I think runsc would see it as a failure from systemd, not polkit, but I'm not sure. Either way, it would be nice if it worked by default.

dseomn avatar Mar 14 '25 00:03 dseomn

In case it's relevant to how podman calls runsc, here's part of the output of podman info on my system:

host:
...
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2

dseomn avatar Mar 14 '25 00:03 dseomn

I have the same issue. podman only works with --runtime-flag=ignore-cgroups unless running as root.

Let me know if you'd like any info from my system.

igor-borisoglebski avatar Mar 27 '25 11:03 igor-borisoglebski

“Interactive authentication required” is a red herring. The actual problem is that runsc is talking to the system service manager when it should be talking to the user service manager. Unprivileged users should not have the ability to create new system-wide units, but the can (and do) have the ability to create per-user units!

DemiMarie avatar Jul 09 '25 05:07 DemiMarie