for-linux Allow FUSE functionality by default

[ ] This is a bug report
[x] This is a feature request
[x] I searched existing issues before opening this one

Expected behavior

Mounting FUSE filesystems should work out-of-the box, because it is safe. It fits within the idea of a containerized app.

Actual behavior

An attempt to mount a FUSE filesystem fails with:

fuse: device not found, try 'modprobe fuse' first or fuse: failed to exec fusermount: No such file or directory

The only way to fix it is to run the container with additional permissions:

--cap-add SYS_ADMIN --device /dev/fuse

This makes it very difficult to run FUSE inside Docker because it is often all but impossible to run with additional flags in a managed environment.

Steps to reproduce the behavior

git clone https://github.com/rustyx/fuse-hello.git
docker build fuse-hello -t hello
docker run -it hello
docker run -it --device /dev/fuse hello
docker run -it --cap-add SYS_ADMIN --device /dev/fuse hello

Output of docker version:

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Thu Jan 11 22:29:41 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.10.1
  Git commit:   f150324
  Built:        Wed May  9 22:20:42 2018
  OS/Arch:      linux/amd64
  Experimental: false

May 29 '18 09:05 rustyx

Strongly agree this would be a great feature. It's fairly common to abstract various services via a FUSE driver. If mounting one requires root-like capabilities it encourages lax security.

Jan 04 '19 20:01 andersjohansenange

The kernel requires SYS_ADMIN we can't change this.

Mar 01 '19 13:03 justincormack

@justincormack What about this FUSE Gets User Namespace Support With Linux 4.18

Just a memo below on how it doesn't work currently.

Ubuntu 18.04 with stock hwe kernel 4.18.0-18, docker 18.09.5.

docker run --rm -it --device=/dev/fuse ubuntu:18.04
apt update
apt install -y fuseiso wget
adduser --disabled-password --gecos '' test
cd /home/test
su test
mkdir mnt
wget https://cdn.openbsd.org/pub/OpenBSD/6.5/amd64/cd65.iso
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
addgroup fuse
usermod -aG fuse test
su test
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted
exit
fuseiso cd65.iso mnt
>>>>>>> fusermount: mount failed: Operation not permitted

So atm it doesn't work for

root
regular user
regular user added to fuse group (just in case)

Apr 30 '19 13:04 dandelionred

Someone correct me I am wrong, trying to wrap my head around the limitations here.

The user namespace means we could do the current method more securely, perhaps without adding the SYS_ADMIN capabilities, but would still require the fuse device to be passed through.

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

If a container's OS was modified to intercept file system calls to emulate it's own FUSE then those FUSE mounts would not be accessible from the host. Is this even possible?

Jul 21 '19 15:07 zbyte64

This prevents containers with FUSE from being used on Windows and OSX hosts.

Fuse mounting inside containers work just fine with Docker for Windows, when passing the same flags: --cap-add SYS_ADMIN --device /dev/fuse.

I think the parent poster would want it to just work without any flags?

In my opinion the SYS_ADMIN is the one we shouldn't need. If only --device /dev/fuse were required.

Jul 21 '19 22:07 Ciantic

@zbyte64

When any mount occurs in a container it is also modifying the host mounts, hence the need for host cooperation. This prevents containers with FUSE from being used on Windows and OSX hosts.

Well, as of 4.18, you have user namespace mounts for fuse which means you shouldn't need to change the host mounts and thus wouldn't need SYS_ADMIN.

Jul 22 '19 01:07 omeid

@omeid , what do you meant by 4.18 ? the latest version of blobfuse is 1.0.3 ? which version of blobfuse are you using to run as non root?

Aug 01 '19 07:08 cometta

@cometta He means this https://github.com/docker/for-linux/issues/321#issuecomment-487955090

Aug 01 '19 11:08 dandelionred

+1 on this, requiring SYS_ADMIN is basically a non-starter for us, though the extra device shouldn't be an issue (assuming 4.18+ kernels). Can this get triaged ?

Dec 13 '19 22:12 miketzian

The ability to run fuse without SYS_ADMIN has been enabled for since August, 2018, and yet there hasn't been much traction on this ticket. Running in privilege mode in production should scare most security teams! Is there anything we can do to get more traction on this story?

Dec 24 '19 05:12 ryanlamore

SYS_ADMIN is quite a powerful role, if there is a way to mount without that role, it could avoid a lot risk.

Feb 24 '20 08:02 1zg12

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Mar 16 '20 08:03 yyb196

I think its all about the linux kernel which need to provide the ability to mount without the sys_admin capability, isn't in the scope of docker

Checkout this earlier comment, Linux kernel appears to have added namespace support for fuse in 4.18.

Mar 17 '20 16:03 norpol

Has anybody actually tried to do this? I've added mount to my seccomp allow list and still get permission denied on mount:

/bin/fusermount: mount failed: Operation not permitted
panic: fusermount exited with code 256


goroutine 1 [running]:
main.main()
	/Users/cpuguy83/go/src/github.com/cpuguy83/tarfs/cmd/tarfsd/main.go:46 +0x697
root@6bd1a24bcd1a:/# uname -a
Linux 6bd1a24bcd1a 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 armv7l GNU/Linux

Something tells me there is much more to this than just allowing mount without CAP_SYS_ADMIN

Mar 18 '20 23:03 cpuguy83

@cpuguy83 Make sure you have unprivileged_userns_clone kernel param set.

Mar 21 '20 01:03 omeid

@omeid That's a debian specific kernel param for enabling (or rather disabling?) userns for unprivileged users, I think?

Mar 23 '20 17:03 cpuguy83

Debian, Archlinux, too. Check your kernel documentation, and also make sure it is compiled with .CONFIG_USER_NS.

Mar 24 '20 03:03 omeid

@omeid I can create a userns just fine, what I can't do is mount in the userns w/o CAP_SYS_ADMIN. I'm attempting to do this by taking the default seccomp profile and adding unshare and mount to the allow list.

Mar 24 '20 16:03 cpuguy83

Any updates on this since?

Jun 06 '20 22:06 pmjohann

I need this as well, and giving my containers SYS_ADMIN permissions just for FUSE is not an option

Jun 16 '20 11:06 skaldesh

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.
Ensure the fuse module is loaded
Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json
In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.
In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Aug 20 '20 15:08 juergbi

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Dec 17 '20 23:12 JuniorJPDJ

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

Dec 18 '20 00:12 ssokolow

@juergbi why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile? I'm trying to understand the issue and also need FUSE in docker.

Mounting anything (FUSE and other filesystems) requires CAP_SYS_ADMIN privileges even without seccomp restrictions. Outside Docker, unprivileged users can run sshfs with the help of the setuid-root helper binary fusermount. However, in a Docker container setuid fusermount is not supported and hence, sshfs fails unless the Docker container is privileged.

The mentioned unshare command grants CAP_SYS_ADMIN privileges in new user and mount namespaces. This doesn't provide any additional access to the host system, however, it allows mount operations in that new mount namespace. With Linux 4.18 and later, FUSE mounts are allowed in that new mount namespace as well. So sshfs can work inside the new namespaces.

Other container engines may create an unprivileged user namespace as part of container startup, which may allow mounts without the extra unshare step. However, Docker doesn't work that way with its system daemon.

Dec 18 '20 09:12 juergbi

why can't you use sshfs directly in docker? what's stopping you from doing so after patching seccomp profile?

I'm not sure I understand your question. (The answer seems too obvious to me, so I must be misinterpreting it.)

FUSE is the kernel API that sshfs is built on top of, and Docker doesn't run a second kernel inside the container, so the fuse module must be loaded, access to /dev/fuse is necessary for the sshfs binary to communicate with the kernel, and anything that interferes with sshfs's ability to perform the mount operation must be disabled.

I know, I ment he runs sshfs inside unshare'd namespace inside docker.

Dec 18 '20 12:12 JuniorJPDJ

I'm trying to understand the issue and also need FUSE in docker.

Slightly unrelated. But I also use FUSE in docker to mount ISO files as a non-root user.

Dec 18 '20 18:12 chalbersma

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

@juergbi : I was able to replicate this setup on ubuntu 18.04. I used -r instead of -c because the util-linux shipped in ubuntu18.04 does not have -c

Fuse works :) But i want to be able to install ubuntu packages in the unshared shell (apt-get install foo) . I get this error

W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (1: Operation not permitted)
W: chown to _apt:root of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (22: Invalid argument)
W: chmod 0700 of directory /var/lib/apt/lists/auxfiles failed - SetupAPTPartialDirectory (1: Operation not permitted)

Do you have any suggestions to work around this?

Apr 02 '21 18:04 dotslash

Any progress on this?

Jul 10 '22 21:07 Nicba1010

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

can we possibly get a docker image of this config?

Jan 15 '23 23:01 acidjazz

The following steps work for me on Fedora 32 to use FUSE in a Docker container without --privileged or --cap-add SYS_ADMIN:

Patch the seccomp profile to drop the restriction on clone(2) namespace flags and allow mount(2) and umount(2): https://gist.github.com/juergbi/44b0e7aaa50742f996eed0693e053cda This is a patch for profiles/seccomp/default.json as available in the docker/moby repositories.

Ensure the fuse module is loaded

Run the Docker container with the options --device /dev/fuse --security-opt seccomp=/path/to/fuse.json

In the Docker container run unshare -c --keep-caps -m to open a shell in new unprivileged user and mount namespaces.

In that new shell it's then possible to mount and use FUSE. E.g., sshfs user@host:directory /mnt

Depending on the uid mapping Docker uses, this can be considered secure as long as you trust the kernel implementation of unprivileged user namespaces (and FUSE). It would be great if this was supported by default or at least as an easy-to-use alternative profile.

Side-note: To allow mounting tmpfs in the user namespace in the container, Fedora additionally requires --security-opt label:type:container_userns_t (SELinux).

Is there any security implications doing so ? I want to allow untrusted users to access FUSE for rclone mount but it would be great if they can't access the host's filesystem.

Feb 08 '23 09:02 quantumsheep

for-linux for-linux copied to clipboard

Allow FUSE functionality by default

Expected behavior

Actual behavior

Steps to reproduce the behavior

for-linux
for-linux copied to clipboard