sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

No such file or directory - unprivileged_userns_clone

Open matthewparkinsondes opened this issue 1 year ago • 7 comments

Am seeing the following error when attempting to install Sysbox on an RKE2 worker node.

sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory


Using the latest version of all system components, including the latest "longterm" version of the Linux kernel as per https://kernel.org.

os=Ubuntu 22.04.4 LTS, kernel=6.6.17-060617-generic, sysbox=0.6.3 CE, rke2=v1.28.6+rke2r1, rancher=v2.8.2


Here is a summary of results when using various kernel versions.

  • 6.8-rc4 (mainline) - fails
  • 6.7.5 (stable) - fails
  • 6.6.17 (longterm) - fails
  • 6.5.18 (from the mainline) - succeeds
  • 6.5.18 (Ubuntu LTS HWE) - succeeds

Here are the Sysbox logs.

Detected Kubernetes version v1.28 2024-02-18T14:06:20.436036364+10:00 Adding K8s taint "sysbox-runtime=not-running:NoSchedule" to node ... 2024-02-18T14:06:20.502085520+10:00 node/w6 modified 2024-02-18T14:06:20.506327963+10:00 Adding K8s label "crio-runtime=installing" to node ... node/w6 not labeled 2024-02-18T14:06:20.571532866+10:00 Deploying CRI-O installer agent on the host (v1.28) ... 2024-02-18T14:06:21.002092200+10:00 Running CRI-O installer agent on the host (may take several seconds) ... 2024-02-18T14:06:23.395783002+10:00 Removing CRI-O installer agent from the host ... 2024-02-18T14:06:23.908565022+10:00 Configuring CRI-O ... 2024-02-18T14:06:25.114476735+10:00 Adding K8s label "sysbox-runtime=installing" to node ... 2024-02-18T14:06:25.175698518+10:00 node/w6 not labeled 2024-02-18T14:06:25.194719353+10:00 Installing Sysbox dependencies on host ... 2024-02-18T14:06:25.205312310+10:00 Copying shiftfs sources to host ... 2024-02-18T14:06:25.206274620+10:00 Kernel version 6.6, which is above the max required for shiftfs (6.2); skipping shiftfs installation. 2024-02-18T14:06:25.206287839+10:00 Deploying Sysbox installer helper on the host ... 2024-02-18T14:06:25.499708961+10:00 Running Sysbox installer helper on the host (may take several seconds) ... 2024-02-18T14:06:28.052809028+10:00 Stopping the Sysbox installer helper on the host ... 2024-02-18T14:06:28.404940170+10:00 Removing Sysbox installer helper from the host ... 2024-02-18T14:06:28.704056754+10:00 Installing Sysbox on host ... Configuring host sysctls ... 2024-02-18T14:06:29.921906091+10:00 sysctl: cannot stat /proc/sys/kernel/unprivileged_userns_clone: No such file or directory 2024-02-18T14:06:29.922058926+10:00 fs.inotify.max_queued_events = 1048576 2024-02-18T14:06:29.922064366+10:00 fs.inotify.max_user_watches = 1048576 fs.inotify.max_user_instances = 1048576 2024-02-18T14:06:29.922069990+10:00 kernel.keys.maxkeys = 20000 2024-02-18T14:06:29.922072582+10:00 kernel.keys.maxbytes = 1400000 2024-02-18T14:06:29.922075222+10:00 kernel.pid_max = 4194304

matthewparkinsondes avatar Feb 18 '24 04:02 matthewparkinsondes

Also noticed the following vulnerability can be mitigated by disabling unprivileged user namespaces.

  • https://coder.com/blog/statement-on-the-recent-cve-2022-0185-vulnerability
  • https://ubuntu.com/security/CVE-2022-0185

sysctl -w kernel.unprivileged_userns_clone=0

Am assuming the Sysbox error encountered above relates to attempting to disable unprivileged user namespaces with sysctl.

matthewparkinsondes avatar Feb 19 '24 08:02 matthewparkinsondes

Found the following reference in the Sysbox user guide.

  • https://github.com/nestybox/sysbox/blob/release_v0.6.3/docs/user-guide/troubleshoot.md#unprivileged-user-namespace-creation-error

Along with the following fix.

sudo sh -c "echo 1 > /proc/sys/kernel/unprivileged_userns_clone"

This appears to allow the vulnerability identified above (as opposed to preventing it).

Attempting to execute the fix produces the following error.

sh: 1: cannot create /proc/sys/kernel/unprivileged_userns_clone: Directory nonexistent

Performing the following command shows that the /proc/sys/kernel directory exists.

ls -l /proc/sys/kernel

matthewparkinsondes avatar Feb 19 '24 11:02 matthewparkinsondes

Hi @matthewparkinsondes,

Thanks for reporting the issue.

Context

Normally, Sysbox does not require the /proc/sys/kernel/unprivileged_userns_clone sysctl to be set, except in some scenarios (see below). That's because the /proc/sys/kernel/unprivileged_userns_clone sysctl allows unprivileged users to create user-namespaces. Since Sysbox runs as a privileged process on the host, the sysctl does not apply to Sysbox per-se.

Having said that, inside a Sysbox container, all processes are unprivileged at host level since they run inside a user-namespace. If a process inside the Sysbox container wished to create a user-namespace (e.g., running Docker with userns-remap enabled for example), then that would fail unless /proc/sys/kernel/unprivileged_userns_clone is set at host level.

This is why the Sysbox K8s installer tries to set /proc/sys/kernel/unprivileged_userns_clone to 1. And sysbox-runc also checks for this when creating a container.

Though we could relax this so that Sysbox does not require it, in general having /proc/sys/kernel/unprivileged_userns_clone set to 1 is not a security problem, except when there are Linux kernel user-namespace vulnerabilities / CVEs (such as CVE 2022-0185) that allow an unprivileged process to escalate to root by creating a user-namespace.

The issue you reported

Here is a summary of results when using various kernel versions.

6.8-rc4 (mainline) - fails 6.7.5 (stable) - fails 6.6.17 (longterm) - fails 6.5.18 (from the mainline) - succeeds 6.5.18 (Ubuntu LTS HWE) - succeeds

This is interesting; normally Ubuntu/Debian distros have the /proc/sys/kernel/unprivileged_userns_clone sysctl, so something is wrong in 6.6.17 and above.

For the failing cases, what do you see under /proc/sys/kernel?

ctalledo avatar Feb 28 '24 03:02 ctalledo

Hi @ctalledo,

Thanks, here are the contents of /proc/sys/kernel for the 6.5.21 passing case.

6.5.21 image

The 6.6.17 failing case removes the following from /proc/sys/kernel.

  • apparmor_restrict_unprivileged_io_uring
  • apparmor_restrict_unprivileged_unconfined
  • apparmor_restrict_unprivileged_userns
  • apparmor_restrict_unprivileged_userns_complain
  • apparmor_restrict_unprivileged_userns_force
  • unprivileged_userns_clone

6.6.17 image

The /proc/sys/kernel directory in the 6.7.6 failing case adds.

  • apparmor_restrict_unprivileged_unconfined

And it removes.

  • sched_child_runs_first

6.7.6 image

The /proc/sys/kernel directory in the 6.8-rc6 failing case is identical to 6.7.6

6.8-rc6 image

matthewparkinsondes avatar Feb 28 '24 08:02 matthewparkinsondes

Thanks @matthewparkinsondes for the detailed response. Let me double check on my side and if confirmed, looks like we will need to change the Sysbox check for /proc/sys/kernel/unprivileged_userns_clone in the newer kernels.

Also, in the newer kernels, do you see /proc/sys/user/max_user_namespaces?

Thanks!

ctalledo avatar Mar 06 '24 20:03 ctalledo

thanks @ctalledo.

Also, in the newer kernels, do you see /proc/sys/user/max_user_namespaces?

yes ... this is present in all of the newer kernels ... 6.6.21, 6.7.9 and 6.8-rc7

matthewparkinsondes avatar Mar 06 '24 22:03 matthewparkinsondes

yes ... this is present in all of the newer kernels ... 6.6.21, 6.7.9 and 6.8-rc7

Ah, that likely means that starting with those newer kernels, Ubuntu distros are behaving as Fedora-based distros do, which use /proc/sys/user/max_user_namespaces as a way to enable unprivileged user-ns. If true, this is good news as it creates more consistency across the distros, though we need to update Sysbox to detect the kernel version and then check for the right file.

ctalledo avatar Mar 07 '24 00:03 ctalledo