elastic-ci-stack-for-aws
elastic-ci-stack-for-aws copied to clipboard
User namespace remapping breaks file permissions in containers with non-privileged users
I had a go at upgrading our Elastic CI Stack from v2.3.5 to v3.0.0-rc1, but unfortunately our build broke due to user namespace remapping.
We're running our builds in containers with an unprivileged user (the buildkite-agent user), via the Docker plugin (currently using my branch so that we can --group-add the docker group for ~Docker-in-Docker support~ access to the bind-mounted Docker socket).
On the host we have
$ id
uid=501(buildkite-agent) gid=502(buildkite-agent) groups=502(buildkite-agent),501(docker)
so we've configured the plugin such that our docker run command has
--user 501:502 --group-add 501
and therefore we have access to both the bind-mounted workdir and Docker socket from inside the container.
However, with user namespace remapping, we no longer have permission to access those bind-mounts (for example, the workdir is now owned by root:nogroup inside the container).
I looked at the possibility of adding --userns=host to the docker run arguments, because I like having the secure default of enabling remapping on the daemon and opting out on a per-container basis.
Sadly, this didn't work; in this case the file permissions are ok on the bind-mounts, but are screwed up on all the other files in the container (everything under / that should be owned by root becomes owned by buildkite-agent!), which I think is due to moby/moby#27775.
How would you feel about reintroducing the EnableDockerUserNamespaceRemap parameter?
The default could still be true, but it'd allow users to opt out if they're not going to be running as root in their containers.
Hi @haines, yup, I think we'll consider bringing it back. I'd be keen to know a bit more about what you are trying to accomplish though, I don't quite understand why you need to set a user and group on some builds.
Our plan is to offer an agent bootstrap that runs all jobs inside a docker container, with optional docker socket access via https://github.com/buildkite/sockguard. Might that be a viable alternative security model to the one you are currently going for with Docker-in-Docker? My experience with DIND is that it's almost always extremely painful on a lot of levels!
Awesome, thanks @lox.
Essentially all we want to achieve is
- all
commandsteps run inside a container (with image pulled from ECR) - the user in that container isn't
root
To make this work, we needed to match the uid:gid of the user in the container with those of the user on the host (because the workdir is bind-mounted and keeps its ownership and permissions from the host, so if we run as a different non-privileged user, we won't have access to write to the workdir).
Re: Docker-in-Docker, I've gotten my wires crossed there, we aren't doing actual Docker-in-Docker, but rather just running the docker binary inside the container and connecting to the socket that we bind-mount from the host. The reason we we need to --group-add the docker group from the host is so that our unprivileged user in the container has access to the socket. Sorry for the confusion, I had forgotten that DinD was something different!
The net result is that the user inside the container is identical to the user on the host - the same uid, gid, and additional groups, so has all the same permissions and can access the bind-mounts.
An agent bootstrap that allowed us to run every job inside a container without the need to bind-mount things from the host would be really great! If we had that, we could just add steps to create a non-privileged user in the Dockerfile rather than passing one in at runtime.
Based on our Slack discussion, sounds like having a non-privileged docker-daemon might solve a lot of the concerns that you were trying to solve initially?
It was intended that v3.0.0 would be a major version change around the userns stuff.
We are going to add back a way to turn off userns-remapping :(
We've brought back EnableDockerUserNamespaceRemap in #410. You can use this in https://s3.amazonaws.com/buildkite-aws-stack/master/aws-stack.json and we'll be cutting a 3.1.0 release soon.
I've re-opened this, as it's still an issue. It appears that containers with non-root users in them aren't being correctly mapped back to something that buildkite-agent can read on the host.
I came across the same problem, too. My solution was so far to disable user namespace remapping to specific trusted containers, but I am far from happy with it. If someone has a better solution, I would be interested ;)