Alertmanager cannot access its own data with latest Docker image
I upgraded to the latest Docker image (from unknown earlier version, maybe 6 months old) and now discover that alertmanager cannot access its own data anymore.
level=error ts=2019-02-05T10:08:55.171604943Z caller=main.go:221 err="open /alertmanager/nflog: permission denied"
The config and data are stored in a directory owned by root:root with only root having access. The data directory is mapped to /alertmanager/ in the container. The container is configured to run without any user account being explicitly specified, whatever that means ("use what is in the dockerfile, default root" I believe).
From host, the permissions of the directories are:
drwxrwx--- 2 root root 4096 Feb 5 10:57 config
drwxrwx--- 2 root root 4096 Feb 5 10:40 data
And the contents of the data directory are:
-rw-rw---- 1 root root 156 Feb 5 10:40 nflog
-rw-rw---- 1 root root 501 Feb 5 10:40 silences
Possibly related to #1585 and #1586?
I am not a Linux access control expert but I consider it VERY dubious to do any permission/user management inside a container. How permissions are assigned to mapped volumes and their contents and what user account a container runs as are decisions for the operator who runs the container and the software inside a container should not be doing any touching of permissions or user accounts at all. Alertmanager is not the only thing using the filesystem and it could not possibly know in advance what security configuration I desire on my server's filesystem.
If this is some intentional feature, please describe the exact usage pattern that allows me to provide root:root owned config and data directories to the Alertmanager container. Right now I cannot determine a correct way to start alertmanager without sharing my data directories excessively to non-root users.
I am open to adjusting my filesystem setup. My goal is to have the config and data accessible only to Alertmanager and root/sudoers. This used to work if I just had my files restricted to root:root but not anymore, apparently.
I can work around this by adding -u root to my Docker run statement. I guess it overrides the configuration in the Dockerfile to some extent? Seems like a hack that should not be needed, though. Please provide a mechanism that does not require such an action. If you think this is the right solution, please document it. Doing a chown seems wrong in any case - the container does not own my filesystem.
(Yes, the same problem affects latest Prometheus version)
You can override nobody and run the container as the user you wish:
mkdir /alertmanager
chown 999:999 /alertmanager
docker run -u 999:999 -v /alertmanager:/alertmanager prom/alertmanager
Does it solve your problem?
I use a similar workaround, yes. Although I also see a chown in the Dockerfile that makes me dislike this approach.
However, I did not really file this issue to solve my problem. It is to provide feedback that such a problem should not exist at all. I do not think it is good design to have the software in the container modify filesystem ACLs - it does not own my filesystem and only I can know which users I want to have access to which files on my server.
Although I also see a
chownin the Dockerfile that makes me dislike this approach.
I am not sure we are on the same page here. The chown only acts on the internal filesystem of the container image at build time. It does not touch the permissions of the volume you mount into the container at runtime.
allows me to provide
root:rootowned config and data directories to the Alertmanager container.
That would require you to run as root inside the container, which is very risky given the amount of privileges the container would have at that point.
I am curious what you think. Thanks for your feedback. We are open to concrete change suggestions to our Dockerfile.
I am not sure we are on the same page here. The
chownonly acts on the internal filesystem of the container image at build time. It does not touch the permissions of the volume you mount into the container at runtime.
You are right, of course - my thoughts were running off on the wrong track there. I was thinking of the Dockerfile command as also acting on startup, which is silly now that you point it out.
That would require you to run as root inside the container, which is very risky given the amount of privileges the container would have at that point.
From my (definitely imperfect) understanding of container runtimes, the "root" user inside the container would still be limited to affecting only the world within that container, at least if not starting the container in privileged mode or mapping too much of the host filesystem into the container.
Running Docker containers as root is the default configuration and it is what users expect. It is the common way to use Docker, as far as I know.
My most outstanding concern here is that I am trying to encourage the use of the Prometheus stack to my colleagues and I know they will be faced with this issue and they will think the Prometheus stakc is something that requires far too much manual labor to configure if they set it up "as they always set up containers" and immediately get permission errors. The fact is that for most users, dare I say even 90%+, running every container as root is perfectly fine. I do not mind going further but that should not make the experience more difficult for users.
My concrete suggestion for the Dockerfile change would be to remove the USER declaration and leave it to the default of root. The operator of the server can choose to start the container under a more constrained user account if they consider it necessary and if they consider the benefits to outweigh the costs.
I think that the current approach (eg docker run -u ...) is the correct one, security-wise. Also it has already been discussed before, see https://github.com/prometheus/alertmanager/issues/39 and https://github.com/prometheus/prometheus/issues/1637. I understand that it makes things a little bit harder to setup and it would probably be worth highlighting it somewhere in the documentation.
And CVE-2019-5736 for runc is just one recent example of why you shouldn't run your container as root: https://kubernetes.io/blog/2019/02/11/runc-and-cve-2019-5736/
Similar to passing --config.file=, I'm passing --storage.path=/home which is owned by nobody in the container image.
We observed this occurring where the backing storage was AWS EBS volumes, and the mounted volume was owned by root user instead of the container user. It impacted one of three alertmanager containers (on OpenShift). In our instance, it was resolved by replacing the volume entirely.
is there any news on this issue?