for-linux Allow setns() in container, or add flag to allow it specifically

The docker update from 1.11.x to 1.12.x seems to have broken setns() calls inside container. setns() is used by Chrome for creating a namespaces. I figured this out after reading this SO post

The only solution right now is to run chrome with --no-sandbox but that's way way less than ideal. Another "solution" is to run the container with --cap-add=SYS_ADMIN -- which is a rather broad thing to do.

[X] This is a bug report
[X] This is a feature request
[X] I searched existing issues before opening this one

Expected behavior

I expect to EITHER have a flag to enable setns() in the container (so that Chrome can run securely), OR allow setns() in docker containers.

Actual behavior

Right now, the whole world is effectively using --no-sandbox to run Chrome in containers. Seriously.

Steps to reproduce the behavior

Create a docker container with Chrome in it
Try to run Chrome
Try again with --no-sandbox

Output of docker version:

    Client:
     Version:      1.13.1
     API version:  1.26
     Go version:   go1.8.3
     Git commit:   092cba3
     Built:        Thu Oct 12 22:34:44 2017
     OS/Arch:      linux/amd64

    Server:
     Version:      1.13.1
     API version:  1.26 (minimum version 1.12)
     Go version:   go1.8.3
     Git commit:   092cba3
     Built:        Thu Oct 12 22:34:44 2017
     OS/Arch:      linux/amd64
     Experimental: false

Output of docker info:

 
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 23
Server Version: 1.13.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 23
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-46-generic
Operating System: Ubuntu 17.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.68 GiB
Name: merc-B250M-D3H
ID: 5VQF:HZG3:ULIM:TQOZ:ITG2:SUGX:HFZ2:QBZH:HJR6:GABW:COXR:CY3E
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Nov 22 '18 01:11 mercmobily

Most likely this is the seccomp policy blocking setns. You can supply a custom seccomp policy.

But also, you are already running chrome in a container, what is the need to do another "setns"?

Nov 22 '18 03:11 cpuguy83

I can see this page that explains how to set Seccomp profiles for dockers, and especially the option --security-opt seccomp=/path/to/seccomp/profile.json. What would the profile.json have to contain to just whitelist setns()?

As per your second question, imagine a complex application that needs Chrome in headless mode, to run some client-side testing or to generate PDFs -- or whatever. In this case, running Chrome without a sandbox would imply that a hacker could exploit some of Chrome's vulnerabilities to gain access to the instance.

If you let me know, and I see that it works, I will make sure I tell pretty much everybody online (that would include the Selenium people, but also countless people out there on SO and various forums) that there is a solution other than disabling the sandbox, which in many cases is a really bad idea.

Nov 22 '18 21:11 mercmobily

You may be interested in @jessfraz's Dockerfile for chrome https://github.com/jessfraz/dockerfiles/blob/master/chrome/stable/Dockerfile

and the corresponding seccomp profile https://github.com/jessfraz/dotfiles/blob/master/etc/docker/seccomp/chrome.json

Nov 22 '18 21:11 thaJeztah

Ah, I even looked into those repos a lot in order to figure out what was going on... but never together, and without knowing about the seccomp flag. So... The chrome json file seems to be listing a lot of calls to allow. I guess it's because it basically overrides the full default seccomp setting Is there no way to make this future-proof, and say "apply the default setting, with this difference" so to speak?

Nov 22 '18 21:11 mercmobily

@jessfraz I saw a few tickets on your repos where people asked you about the Chrome issue; you referred them to your dotfiles. However, I believe a more detailed explanation, when this problem arises. Just my humble 2c -- thank you for everything!

Nov 22 '18 22:11 mercmobily

Is there no way to make this future-proof, and say "apply the default setting, with this difference" so to speak?

Yes, the seccomp profile is unfortunately quite verbose. This was by design, because the default profile is configurable on the daemon, so if a container would only specify the "diff" (assuming the daemon runs the default profile), the result would be unpredictable. So for that reason, the seccomp profile requires you to specify exactly what the profile should look like.

Perhaps it would be a fun "pet" project to create a seccomp-profile generator, i.e. something like;

seccomp-bake \
  --default-profile=profiles/seccomp/default.json \
  --whitelist-add=foo,bar,baz \
  -o ./my-profile.json

(although probably could be done with, e.g., jq)

Another improvement would be this proposal; https://github.com/moby/moby/issues/32801 (adding "entitlements"), which would make setting security options more user-friendly

Nov 22 '18 22:11 thaJeztah

Hi,

alright... I will test this out on my own machine (mainly making sure that setns() is the only thing Chrome needs, and if it isn't, wrestling permissions till I get it right, possibly checking @jessfraz's settings in Chrome) and will then proceed to mass-answering people with the same issue in the gazillion places I've found (probably just pointing to this issue, which right now is pure gold to a lot of people out there)

Nov 22 '18 22:11 mercmobily

Hi,

So, it's not just setns -- as I imagined. After cutting and sorting and diffing, here is the list of calls that are NOT whitelisted in the default config file but are listed in @jessfraz's Chrome config file.

    > arch_prctl
    > chroot
    > clone
    > fanotify_init
    > name_to_handle_at
    > open_by_handle_at
    > setdomainname
    > sethostname
    > syslog
    > unshare
    > vhangup
    > setns

I frankly don't know if all of them are needed. I assume @jessfraz would have straced chrome and checked which calls were called... maybe?

So, at this stage if somebody wants to run Chrome in a docker container, they can basically:

Get the default seccomp config file
Add the calls above in the whitelist at the top, the one that starts with:

"syscalls": [ { "names": [
Enjoy a safe Chrome.

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Before I go out there and tell everybody, may I ask: I realise that the list above is the full list that will make it work with Chrome. But... can it be shortened? How was it worked out? Trial & error? Strace? Grepping Chrome's source?

I guess the best person to answer would be @jessfraz -- any hints?

Nov 23 '18 03:11 mercmobily

Almost certainly strace

Nov 23 '18 03:11 cpuguy83

@cpuguy83 If that is the case, there is no point in trying and shorten it.

Do you think it's worthwhile trying my luck, and see if the Docker people would accept a pull request adding CHROME the same way CAP_SYS_ADMIN is?

This wold be to help out all those people out there trying to get headless chrome to do software testing in a container...

Nov 23 '18 03:11 mercmobily

Sorry no. Those capabilities are actual Linux capabilities

Nov 23 '18 03:11 cpuguy83

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Chrome will be sandboxed as a whole by the container; if those containers are minimal (only contain chrome, and the bare minimum required), and follow best practices, such as running as a non-privileged user, run with a read-only filesystem, have --security-opt=no-new-privileges set, as well as memory and CPU constraints), no damage could be done beyond what's inside the container (possibly, the profile could be tightened further, as the default profile is a "generic" profile for common use).

Note that @jessfraz's Dockerfile (and seccomp profile) is targeted at desktop / interactive use of the Chrome container, and therefore may be more permissive than required for your use case (running Selenium tests in headless mode).

Given that more syscalls are whitelisted in the Chrome seccomp profile, that actually means the profile is less restrictive than the default, thus introducing more risks if the container gets compromised.

Nov 23 '18 10:11 thaJeztah

@thaJeztah Yours is a compelling argument. However, if the container for example must be able to connect to a database server, for example, a non-sandbox chrome might become the gateway to gain read-access to the database and get credentials. If a shell is obtained, the intruder will be able to reach hosts that would normally be unreachable. So, while it's true that a malicious user exploiting a Chrome vulnerability would "only" be able to access the container, there are many cases where access to that container's data -- and even just having a shell in that container -- might be a problem bigger than expected. You can surely think of several dangerous scenarios if you have an application server that needs to run headless Chrome (for example to create PDFs, for example).

Your comment on the possibiity of headless Chrome not needing all of these:

> arch_prctl
> chroot
> clone
> fanotify_init
> name_to_handle_at
> open_by_handle_at
> setdomainname
> sethostname
> syslog
> unshare
> vhangup
> setns

Is interesting; by looking at them, I doubt headless Chrome would need much less. But, it would need investigation for sure.

Nov 23 '18 11:11 mercmobily

@thaJeztah Any thoughts? I don't want to recommend anything to anybody unless it's sound advice, and your message cast some doubts on my reasoning. When you write _ if those containers are minimal (only contain chrome, and the bare minimum required),_, I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so. Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

Nov 26 '18 02:11 mercmobily

I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so.

If this is in a CI environment, you should assume the content you're running is compromised, and configure what the container is allowed to access based on that assumption. In a Docker setup, that could also mean; connect the container to a network that only allows it to connect to those services/containers that you want it to be able to reach. (If this is about "results", and you don't want it to be able to "push" those changes, perhaps writing to a file, and collect those changes would be an option). That said; I don't have a lot of experience with setting up Selenium, so not sure I can give more advice on that part 😅

Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

I'll defer that one to @justincormack and @jessfraz, who are probably better at answering that.

Nov 26 '18 10:11 thaJeztah

My use case is not actually selenium/CI. That's just a common user case that requires chrome to run. in my specific case, my server uses Chrome headless to create PDF files. I realise it's not very common, but there are cases where headless Chrome needs to run as part of a complex server application.

On Mon, Nov 26, 2018, 6:34 PM Sebastiaan van Stijn <[email protected] wrote:

I think that those "minimum requirements" for server-side Chrome will inevitably have to report the results to another host, possibly have access to hosts otherwise protected, and have some privileges to do so.

If this is in a CI environment, you should assume the content you're running is compromised, and configure what the container is allowed to access based on that assumption. In a Docker setup, that could also mean; connect the container to a network that only allows it to connect to those services/containers that you want it to be able to reach. (If this is about "results", and you don't want it to be able to "push" those changes, perhaps writing to a file, and collect those changes would be an option). That said; I don't have a lot of experience with setting up Selenium, so not sure I can give more advice on that part 😅

Om the other hand, do you think the syscalls above have security implications? (arch_prctl jumps to mind)

I'll defer that one to @justincormack https://github.com/justincormack and @jessfraz https://github.com/jessfraz, who are probably better at answering that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/docker/for-linux/issues/496#issuecomment-441592637, or mute the thread https://github.com/notifications/unsubscribe-auth/ACB7XqDU4vBbz8_k7ygzb1QNN5YAOGL3ks5uy8PJgaJpZM4YuiJM .

Nov 26 '18 11:11 mercmobily

My use case is running Chrome (headful) with AWS Fargate, where neither --cap-add nor --security-opt can be used, does this mean I can only run Chrome with --no-sandbox?

Jan 09 '19 02:01 lucifer1004

My use case is running Chrome (headful) with AWS Fargate, where neither --cap-add nor --security-opt can be used, does this mean I can only run Chrome with --no-sandbox?

If there's no option to customize those (or the daemon configuration), then probably: yes.

If a dedicated option was added for this, then you'd probably also not be able to configure that in that case, so it may be better to open a feature request with AWS

Jan 09 '19 21:01 thaJeztah

I think the Selenium people are the first one that must be warned, since right now basically anybody running Travis/Selenium, is running an insecure sandbox-less Chrome. That's planet-wise.

Chrome will be sandboxed as a whole by the container; if those containers are minimal (only contain chrome, and the bare minimum required), and follow best practices, such as running as a non-privileged user, run with a read-only filesystem, have --security-opt=no-new-privileges set, as well as memory and CPU constraints), no damage could be done beyond what's inside the container (possibly, the profile could be tightened further, as the default profile is a "generic" profile for common use).

Note that @jessfraz's Dockerfile (and seccomp profile) is targeted at desktop / interactive use of the Chrome container, and therefore may be more permissive than required for your use case (running Selenium tests in headless mode).

Given that more syscalls are whitelisted in the Chrome seccomp profile, that actually means the profile is less restrictive than the default, thus introducing more risks if the container gets compromised.

Just want to point out that when I run dockerized chromium with --security-opt=no-new-privileges, I get the following error:

       The setuid sandbox is not running as root. Common causes:
         * A parent process set prctl(PR_SET_NO_NEW_PRIVS, ...)
       Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted

Workaround we're using is leaving out no-new-privileges.

Feb 17 '23 19:02 nick-kang

@thaJeztah Yours is a compelling argument. However, if the container for example must be able to connect to a database server, for example, a non-sandbox chrome might become the gateway to gain read-access to the database and get credentials. If a shell is obtained, the intruder will be able to reach hosts that would normally be unreachable. So, while it's true that a malicious user exploiting a Chrome vulnerability would "only" be able to access the container, there are many cases where access to that container's data -- and even just having a shell in that container -- might be a problem bigger than expected. You can surely think of several dangerous scenarios if you have an application server that needs to run headless Chrome (for example to create PDFs, for example).

Your comment on the possibiity of headless Chrome not needing all of these:
> arch_prctl
> chroot
> clone
> fanotify_init
> name_to_handle_at
> open_by_handle_at
> setdomainname
> sethostname
> syslog
> unshare
> vhangup
> setns
Is interesting; by looking at them, I doubt headless Chrome would need much less. But, it would need investigation for sure.

What I'm unsure about is how the no-sandbox option entirely bypasses the need for some of these system calls. I can understand how setns would be needed to create the boundaries between chrome processes/windows, but how about all the rest of these calls? Wouldn't chromium still need a call like setdomainname even without sandboxing?

Mar 08 '23 03:03 CryptoKiddies

Currently, I am able to run Chromium in a container if I pass either "--no-sandbox" or "--security-opt=seccomp=unconfined" command line arguments. I, however, would prefer to have sandboxes working properly as they appear necessary for Chromium's "Site Isolation" Design. Site Isolation "helps defend against... UXSS and fully compromised renderer processes."

There are two (short) Chromium design documents that I found helpful, as I just started researching this. Links to these follow, in the hopes that they may be helpful to others: https://github.com/chromium/chromium/blob/main/docs/linux/sandboxing.md

Linux Sandboxing Chromium uses a multiprocess model, which allows to give different privileges and restrictions to different parts of the browser. For instance, we want renderers to run with a limited set of privileges since they process untrusted input and are likely to be compromised. Renderers will use an IPC mechanism to request access to resource from a more privileged (browser process). ...

https://sites.google.com/a/chromium.org/dev/developers/design-documents/site-isolation

Chrome's multi-process architecture provides many benefits for speed, stability, and security. It allows web pages in unrelated tabs to run in parallel, and it allows users to continue using the browser and other tabs when a renderer process crashes. Because the renderer processes don't require direct access to disk, network, or devices, Chrome can also run them inside a restricted sandbox. This limits the damage that attackers can cause if they exploit a vulnerability in the renderer, including making it difficult for attackers to access the user's filesystem or devices, as well as privileged pages (e.g., settings or extensions) and pages in other profiles (e.g., Incognito mode)....

My goal is to learn how to create a seccomp profile that is optimal for Chromium, where neither Docker nor the browser are less secure as a result. I should probably start with something current. Does anyone know where the source is located for the official "builtin" profile reported by "docker info" command?

Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns

Jun 07 '23 08:06 dusty-1

Code to generate the default (builtin) profile is found here; https://github.com/moby/moby/blob/v24.0.2/profiles/seccomp/default_linux.go

That code generates the default.json file that's in the same directory; https://github.com/moby/moby/tree/v24.0.2/profiles/seccomp

Jun 07 '23 08:06 thaJeztah