runner icon indicating copy to clipboard operation
runner copied to clipboard

Self-hosted runner with Docker step creates files that trip up the checkout step

Open j3parker opened this issue 5 years ago • 62 comments

Describe the bug When using self-hosted runners, git checkouts are cached between runs (this is nice, because it greatly speeds up our builds).

However, if a docker-based step writes a file to the workspace it will (possibly) be owned by the root user. If the permissions don't give the (non-root) action runner user +w permission then a checkout step in a future workflow run will fail to remove this file. The first time, the error will look like this:

##[group]Cleaning the repository
[command]/usr/bin/git clean -ffdx
warning: could not open directory 'foo/': Permission denied
warning: failed to remove foo/: Directory not empty
##[endgroup]
##[warning]Unable to clean or reset the repository. The repository will be recreated instead.
Deleting the contents of '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro'
##[error]Command failed: rm -rf "/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo"
rm: cannot remove '/home/jparker/actions-runner/_work/self-hosted-runner-permissions-issue-repro/self-hosted-runner-permissions-issue-repro/foo': Permission denied

So git clean -ffdx tried to stat() this foo/ directory (created via a container in a previous build) but failed. It was then unable to remove the directory because it wasn't empty. It tried to fall back to rm -rf which failed for the same reasons.

In future builds it goes straight to rm -rf because the .git folder did get cleaned up. It continues to fail in the same way for all future builds. Here's a screenshot:

image

To Reproduce

I've created a repo that reproduces the error: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro

Here's an example of a workflow failing: https://github.com/Brightspace/self-hosted-runner-permissions-issue-repro/runs/596011452?check_suite_focus=true

Expected behavior

I guess I'd expect all the files to be owned by the runner user... in a perfect world. Maybe that can be done with user namespace maps? Documentation. Not sure what that would entail though or if it makes sense for what the runner is doing.

I think this is not an issue with the checkout action because I don't think there is anything they could do about it - it'd impact other actions too, checkout was just the first one I hit the issue with.

Runner Version and Platform

Ubuntu 18.04, runner version 2.168.0

These are org-level runners but I imagine it's not specific to that.

j3parker avatar Apr 17 '20 17:04 j3parker

@j3parker We should document this, basically, if you are using any container feature of GitHub Actions (container step or job container) you probably should run the runner as ROOT to avoid any potential file permission issue caused by files created by the container.

@chrispat thoughts?

TingluoHuang avatar Apr 17 '20 17:04 TingluoHuang

@j3parker thanks for reporting this. I am also facing the same issue. @TingluoHuang I thought to start the runner as root but the utility run.sh has a check to not start runner as root. https://github.com/actions/runner/blob/master/src/Misc/layoutroot/run.sh#L3

# Validate not sudo
user_id=`id -u`
if [ $user_id -eq 0 -a -z "$RUNNER_ALLOW_RUNASROOT" ]; then
    echo "Must not run interactively with sudo"
    exit 1
fi

EDIT : @j3parker I think you can simply achieve this by exporting the variable RUNNER_ALLOW_RUNASROOT=1. Check here. Thank you @TingluoHuang

karancode avatar Apr 21 '20 07:04 karancode

If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well. The other option is to add a step to change permissions of files after the container action runs.

chrispat avatar Apr 21 '20 13:04 chrispat

Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.

bryanmacfarlane avatar Apr 21 '20 13:04 bryanmacfarlane

@karancode the reason that's an envvar is there's many scenarios (service) where it won't work as the service ends up running as a user (the user configured or one specified). but there are some scenarios (like putting the runner in a container) where you want / need to explicitly run as root and that's why the override is there. We should formalize the run as root more and doc it better.

We need to keep this open and think a bit more on what the right solution is. If you want to do run as root for now and you don't run the runner as a systemd service, then make sure however you launch it, it calls runsvc.sh and not run.sh so it doesn't exit on updates.

bryanmacfarlane avatar Apr 21 '20 13:04 bryanmacfarlane

@chrispat @bryanmacfarlane Thanks. In my case, I am running runner inside containers, so I had no other option but to run it as root.. (I tried to change permissions of files after container actions but that way isn't feasible as there could be many different actions generating multiple files.. I also tried running it as a systemd service but it was just uglier).

I also faced the issue with run.sh exiting on updates(observed with minor version update but not with patch). Thanks for the tip, I will check for runsvc.sh and how to use it. PS : If there's any doc please share, would love you contrib. Thanks!

karancode avatar Apr 21 '20 14:04 karancode

If you are going to use container based actions I would recommend you use a job container as well. Mixing container and host environments does not work very well.

Cool - when you say job containers are you referring to this? I hadn't seen that before... I'll definitely try that!

We need to keep this open and think a bit more on what the right solution is.

Thanks! It looks like container: will mitigate this for us for now. If the runner itself messed around with user namespaces that might be able to solve this (they can be nested) but it might be a bunch of work...

If container: works for us we'd be interested in being able to configure our runners to only accept containerized jobs. I can open a feature request for this after testing it out.

j3parker avatar Apr 21 '20 14:04 j3parker

Another option here is that when running inside a container that has the user as root, you can use the jobs.<jobid>.container.options directive to provide the --user uid:gid value of the user and group that the self-hosted action runner is running as.

The downside to this is that details of the actions runner environment is starting to leak into the workflows of your projects, which is less than ideal in larger companies.

peter-murray avatar Jun 16 '20 15:06 peter-murray

This is hitting me as well, at first it was easier to avoid, but now that we have more and more actions it is getting harder.

Is there a way to have the runner clean up the work directory after the workflow is finished. If the runner isn’t running as root, then it probably can’t delete the directory. But it would be able to do that if it was running in a container.

Maybe there is a way, to turn on a clean up job that will run after the workflow is complete to delete the files. If that job is run in a container it should have access to delete everything.

This only works if you have Docker installed, but that is fine because I don’t think the issue happens without docker.

I guess I could add this to all of my workflows, but that doesn’t seem ideal, getting it added at the runner level makes things cleaner. Just an idea, not sure if it is a good one, but figured I would add it here and see what people think.

kencochrane avatar Jul 02 '20 23:07 kencochrane

For all my private actions, I ended up putting USER 1000:1000 in all the Docker actions. Since there is only ever root and the user I created, the only valid options are 0 and 1000. That way, any files created within the Docker action that are persisted by the self-hosted runner, they have the correct user.

The other solution is to just run the runner as root by setting RUNNER_ALLOW_RUNASROOT=1.

jef avatar Jul 28 '20 02:07 jef

in your workflow file use this, e.g:

  build:
    runs-on: self-hosted
    needs: [clone-repository]
    container:
      image: gradle:5.5.1-jdk11
      options: --user 1000
    ...
    ...

merou avatar Aug 03 '20 20:08 merou

I suppose that will only work if you're using container vs a Docker action.

I don't believe an action has those concepts unless specifically written by the action. In which case, it would need to get updated in every action that uses Docker (extreme example). Unless there is a universal environment variable that masks files or sets file creation permissions. (I suppose I'm thinking something similar to UMASK here; not sure really.)

jef avatar Aug 03 '20 21:08 jef

Yes for Docker actions (private or public) - User will need to modify the corresponding Dockerfile by indeed having USER 1000:1000 like you mentioned earlier.

The nice thing is that github actions are indirectly enforcing good practices - i.e do not run a Container as root. I personally think that it is the user responsibility to set the right permissions there.

Concerning container to avoid the options: --user 1000 repetition, it would be nice to be able to define it like this:

defaults:
  runs-on: self-hosted
  container-user-id: 1000

merou avatar Aug 04 '20 10:08 merou

I found this temporary workaround for Docker action: Fix for self-hosted runner

P.S: I updated script and moved it to repository README.md

xanantis avatar Aug 08 '20 13:08 xanantis

You might not be able to run with sudo, but you can add user to the root group. That worked for me :)

sudo usermod -a -G root <USER_NAME>

clementohNZ avatar Aug 12 '20 12:08 clementohNZ

You might not be able to run with sudo, but you can add user to the root group. That worked for me :)

sudo usermod -a -G root <USER_NAME>

Strange... I did this before and still gave me problems. I can try again in the future.

Thanks!

jef avatar Aug 12 '20 15:08 jef

@jef You may try my solution. It is hacky, but it does work for me.

xanantis avatar Aug 12 '20 16:08 xanantis

@jef I rewrote my script. Now it is possible to set docker's user option via env vars. You can find more details here: https://github.com/xanantis/docker-file-ownership-fix#to-fix-all-those-above-for-self-hosted-runner

xanantis avatar Aug 12 '20 23:08 xanantis

Hey! I faced with the same proble. Have you guys found easy solution?

imurashka avatar Nov 30 '20 16:11 imurashka

Hey! I faced with the same proble. Have you guys found easy solution?

This worked for me...

I had to remove manually the conflicted files, and then adding the user ID of the user we use for automation did the trick.

malkam03 avatar Nov 30 '20 16:11 malkam03

Setting the uid is not always possible depending upon the container, how it was built and internal permissions inside the container. It is a good approach most of the time, but there are edge cases and in some use cases where there are separate teams managing the runners and the workflows complications can arise.

I have created an action that can "reset" permissions on the workspace directories and files that would trip up a consecutive run and break at checkout; https://github.com/peter-murray/reset-workspace-ownership-action

It is still not a perfect solution can can be appended to the end of the workflow with minimal impact and overhead to the workflow until there is a longer term fix available.

peter-murray avatar Dec 02 '20 10:12 peter-murray

Also, I don't think we can say must run runner as root since many folks run as systemd service and I don't think that will allow us to run as root.

Ran into this myself today due to running python in docker containers during a test step, creating root-owned pycache files. These files then break the next build when the runner attempts removal during the checkout step like OP.

Are there any issues with running the service as root utilizing the [user] param for install? I'm hosting a runner on ubuntu 20.04.1 and running:

sudo ./svc.sh install root
sudo ./svc.sh start

works well for me as a workaround for now.

MuchToKnow avatar Dec 02 '20 22:12 MuchToKnow

+1. This has also come up where users are using github/codeql-action (for Code Scanning) or other Actions that write to runner.temp. In that case it's possible for the Action to write data to the temp directory that fails to be cleaned up at the start of a next run, because the container user doesn't have permission to delete it. Documenting the right practice of getting the users to match would be a good way to help identify and prevent this.

adityasharad avatar Dec 08 '20 19:12 adityasharad

+1 and the documentation for actions explicitly state that containers should be run as root: https://docs.github.com/en/free-pro-team@latest/actions/creating-actions/dockerfile-support-for-github-actions . Yet if you follow this, the builds will break, unless you're running they GitHub runner itself as root I suppose.

albgus avatar Jan 07 '21 16:01 albgus

Stating that containers should be run as root is unbelievable insecure and lazy and violates the entire concept of process separation. If I don't carefully audit all thirdparty actions I have a strong possibility of opening my network for unknown damage. That's simply unacceptable under any circumstances.

Setting the uid is not always possible depending upon the container, how it was built and internal permissions inside the container. It is a good approach most of the time, but there are edge cases and in some use cases where there are separate teams managing the runners and the workflows complications can arise.

I have created an action that can "reset" permissions on the workspace directories and files that would trip up a consecutive run and break at checkout; https://github.com/peter-murray/reset-workspace-ownership-action

It is still not a perfect solution can can be appended to the end of the workflow with minimal impact and overhead to the workflow until there is a longer term fix available.

Thank you, this workaround action helped solve this problem for us.

mkrakowitzer avatar Jan 18 '21 15:01 mkrakowitzer

Created a few weeks ago an example to run on ubuntu with rootless docker. Still testing the set up but it should avoid the root problem, since the docker mapping is fixed.

npalm avatar Jan 18 '21 16:01 npalm

I created a guide based on @npalm 's example on how to run the github actions runner with rootless docker: https://stackoverflow.com/questions/66137419/how-to-enable-non-docker-actions-to-access-docker-created-files-on-my-self-hoste

Frederik-Baetens avatar Feb 10 '21 14:02 Frederik-Baetens

I actually stumbled upon another error, it mostly seems to work, hence the guide, but the new v2 docker build push action which uses buildx fails with

buildx call failed with: error: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: write sysctl key net.ipv4.ping_group_range: write /proc/sys/net/ipv4/ping_group_range: invalid argument: unknown

Edit: filed issue here https://github.com/docker/build-push-action/issues/292

Frederik-Baetens avatar Feb 10 '21 16:02 Frederik-Baetens

Managed to solve that by adding driver: docker:

      - uses: docker/setup-buildx-action@v1
        with:
          driver: docker

Everything now works just as on the ubuntu-latest runners for me! again thanks to npalm

Frederik-Baetens avatar Feb 10 '21 23:02 Frederik-Baetens