actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Cannot Creating a new builder instance in [Set up Docker Buildx]

Open nmiculinic opened this issue 4 years ago • 17 comments

Describe the bug

  • Action which works correctly on hosted github runners does not work in self-hosted version

Checks

  • [x] My actions-runner-controller version (v0.x.y) does support the feature
  • [ ] I'm using an unreleased version of the controller I built from HEAD of the default branch

To Reproduce

      - name: Checkout
        uses: actions/checkout@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
  /usr/local/bin/docker buildx create --name builder-3367d142-667f-46da-9e5a-56a8706f3c86 --driver docker-container --buildkitd-flags --allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host --use
  error: could not create a builder instance with TLS data loaded from environment. Please use `docker context create <context-name>` to create a context for current environment and then create a builder instance with `docker buildx create <context-name>`
  Error: The process '/usr/local/bin/docker' failed with exit code 1

Expected behavior It will work same as in hosted github runners

Environment (please complete the following information):

  • Controller Version [e.g. 0.18.2] app.kubernetes.io/version=0.20.2
  • Deployment Method [e.g. Helm and Kustomize ]: helm
  • Helm Chart Version [e.g. 0.11.0, if applicable]: helm.sh/chart=actions-runner-controller-0.13.2

Helm values yaml:

# helm upgrade --install --namespace actions-runner-system --create-namespace actions-runner-controller actions-runner-controller/actions-runner-controller -f ~/Desktop/grid/infra/staging/gh.yaml
authSecret:
  create: true
  <redacted>

scope:
  singleNamespace: true

githubWebhookServer:
  enabled: true
  secret:
    create: true
    name: "github-webhook-server"
    github_webhook_secret_token: "<redacted>"

metrics:
  serviceMonitor: true

nmiculinic avatar Oct 14 '21 15:10 nmiculinic

@nmiculinic Hey! Could you read this?

I don't know what's the latest situation is, but when I checked it last time I had to patch the action or setup buildx with my own command(without using any premade action).

In other words, I have no idea how we could fix this on our end. Apparently, it isn't that easy and straightforward to keep parity with hosted github actions runners.

mumoshu avatar Oct 14 '21 23:10 mumoshu

Thanks for the link!

I've used this https://github.com/mumoshu/actions-runner-controller-ci/commit/e91c8c0f6ca82aa7618010c6d2f417aa46c4a4bf and got it working.

Cannot you expose some environment variables to make it work seamlessly?

nmiculinic avatar Oct 15 '21 10:10 nmiculinic

this would be great to document too, since it's pretty common usecase for self-hosted runners

nmiculinic avatar Oct 15 '21 10:10 nmiculinic

Cannot you expose some environment variables to make it work seamlessly?

@nmiculinic Hey! What do you mean, exactly? Do you think we can enhance anything other than documentation on our end to enhance the user experience here?

If you're talking about a potential enhancement to docker/setup-buildx-action, I think you'd better file an issue there.

mumoshu avatar Oct 23 '21 08:10 mumoshu

@nmiculinic A documentation improvement would definitely be welcomed! I would review it if you could send a PR for that.

mumoshu avatar Oct 23 '21 08:10 mumoshu

I tried adding the step listed in https://github.com/actions-runner-controller/actions-runner-controller/issues/893#issuecomment-944202747

but I'm running into a problem where the setup-buildx-action is just hanging... I don't know how to debug. The runner logs in k8s don't tell me anything further about what's going on.

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders
      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

image

ghostsquad avatar Dec 02 '21 03:12 ghostsquad

oh.. I just realized, this might be related to this: https://github.com/docker/setup-buildx-action/issues/117

ghostsquad avatar Dec 02 '21 03:12 ghostsquad

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner's default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue. https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034

cdivitotawela avatar Feb 24 '22 07:02 cdivitotawela

I would like to be able to switch the workflows from GitHub runners to self-hosted runners without any modifications. Unfortunately this issue prevents that, as docker build needs to be updated as mentioned in this thread. The reason being that runner's default docker context has value tcp://localhost:2376 and running following creates a new context with value unix:///var/run/docker.sock and use new context.

      - name: Set up Docker Context for Buildx
        id: buildx-context
        run: |
          docker context create builders

      - name: Set up Docker Buildx
        id: buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Following code indicates that, when a new runner is created, controller injects the environment variables and one of those is DOCKER_HOST=tcp://localhost:2376. I am not sure why this is needed and I believe if we remove this environment variable setting it will fix the issue. https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/runner_controller.go#L1034

Docker host is only set to DOCKER_HOST=tcp://localhost:2376 when DIND is run. So I don't think my suggestion is correct. Still searching what I need to do to make same docker build image work with github-runner and sef-hosted runner. :(

cdivitotawela avatar Feb 24 '22 10:02 cdivitotawela

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

rlinstorres avatar May 17 '22 08:05 rlinstorres

I am running into the same issue @ghostsquad is facing where the Set up Docker Buildx step is hanging. Below is the workflow I am running on self-hosted runners in Kubernetes that I believe is using the Docker Image summerwind/actions-runner:latest. I am also unable to see any logs even when including the flag option buildkitd-flags: --debug.

I have tried the following solutions and am still facing the issue:

  • Disabled auto updating of self hosted runners in the deployment
  • Updated “Set up Docker Buildx” stage to v2
  • Updated “Set up Docker Buildx” to leverage both docker-container and kubernetes drivers
  • Updated “Set up Docker Buildx” to use custom commands for buildx such as using curl to download

Any other suggestions? Thanks!

name: GitHub Actions Demo
on: [push]
jobs:
  Explore-GitHub-Actions:
    runs-on: [self-hosted, linux]
    steps:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

Logs (includes manually cancelling it due to hang):

Download and install buildx
  ##[debug]Release v0.8.2 found
  ##[debug]isExplicit: 0.8.2
  ##[debug]explicit? true
  ##[debug]checking cache: /opt/hostedtoolcache/buildx/0.8.2/x64
  ##[debug]not found
  Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
  ##[debug]Downloading https://github.com/docker/buildx/releases/download/v0.8.2/buildx-v0.8.2.linux-amd64
  ##[debug]Destination /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]download complete
  ##[debug]Downloaded to /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]Caching tool buildx 0.8.2 x64
  ##[debug]source file: /runner/_work/_temp/e56573ea-bb4e-46e4-a3e3-136a0b3b2001
  ##[debug]destination /opt/hostedtoolcache/buildx/0.8.2/x64
  ##[debug]destination file /opt/hostedtoolcache/buildx/0.8.2/x64/docker-buildx
  ##[debug]finished caching tool
  Docker plugin mode
  ##[debug]Plugins dir is /home/runner/.docker/cli-plugins
  ##[debug]Plugin path is /home/runner/.docker/cli-plugins/docker-buildx
  ##[debug]Re-evaluate condition on job cancellation for step: 'Set up Docker Buildx'.
  Error: The operation was canceled.
  ##[debug]System.OperationCanceledException: The operation was canceled.
  ##[debug]   at System.Threading.CancellationToken.ThrowOperationCanceledException()
  ##[debug]   at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)
  ##[debug]   at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
  ##[debug]   at GitHub.Runner.Worker.ActionRunner.RunAsync()
  ##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
  ##[debug]Finishing: Set up Docker Buildx

john-yacuta-submittable avatar May 20 '22 17:05 john-yacuta-submittable

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result.

john-yacuta-submittable avatar May 24 '22 17:05 john-yacuta-submittable

FWIW, what worked for me was:

    - run: docker context create mycontext
    - run: docker context use mycontext
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1
      with:
        buildkitd-flags: --debug
        endpoint: mycontext

Perhaps the key difference is that I had docker context use mycontext? 🤔

mumoshu avatar May 25 '22 01:05 mumoshu

I am using sef-hosted runner and also building a Docker Image using summerwind/actions-runner:latest as a base image but I needed to install the Docker Plugins buildx and docker compose. So, during the workflow, I am using these steps below and everything is working fine.

- run: docker context create builders

- uses: docker/setup-buildx-action@v1
   with:
     version: latest
     endpoint: builders

@rlinstorres are you running the self-hosted runners in Kubernetes? I tried this solution as well and got the same result.

Hi @john-yacuta-submittable, let me send you more information about my environment to clarify and also help you!

  • A snippet of my Dockerfile:
FROM summerwind/actions-runner:latest

ENV BUILDX_VERSION=v0.8.2
ENV DOCKER_COMPOSE_VERSION=v2.5.1

# Docker Plugins
RUN mkdir -p "${HOME}/.docker/cli-plugins" \
  && curl -SsL "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64" -o "${HOME}/.docker/cli-plugins/docker-buildx" \
  && curl -SsL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-x86_64" -o "${HOME}/.docker/cli-plugins/docker-compose" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-buildx" \
  && chmod +x "${HOME}/.docker/cli-plugins/docker-compose"
  • EKS version: v1.21.9 (--enable-docker-bridge true --container-runtime containerd
  • actions-runner-controller helm chart version 0.17.3
  • RunnerDeployment and HorizontalRunnerAutoscaler manifest files using my docker image
  • A snippet of my workflow:
jobs:
  build:
    name: Build
    runs-on: fh-ubuntu-small-prod
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up Docker Context for Buildx
        run: docker context create builders
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          version: latest
          endpoint: builders

Also some screenshots:

Screenshot 1: Screen Shot 2022-05-25 at 9 43 55 AM

Screenshot 2: Screen Shot 2022-05-25 at 9 44 45 AM

Screenshot 3: Screen Shot 2022-05-25 at 9 46 05 AM

I hope this information can help you solve your problem.

rlinstorres avatar May 25 '22 08:05 rlinstorres

Thanks @rlinstorres! I managed to resolve my issue. It was an interesting case where I redeployed the node groups in the cluster. After redeployment, they worked just fine. Perhaps it could work for someone else too.

I typically don't like this solution, but we did see that the step in the CI where it was getting stuck was with the file system/kernel level so it was possible the host the self-hosted runners pods were running on, in this case the nodes, was running too hot.

My CI step for "Set up Docker Buildx":

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          driver: docker

john-yacuta-submittable avatar May 25 '22 15:05 john-yacuta-submittable

~~I found that DOCKER_CONTEXT: default environment variable resolves this issue too. We can add this env to RunnerDeployment.spec.template.spec.env~~

~~Maybe we can add this value to ARC itself.~~

Sorry, I was wrong. This workaround doesn't work. I will look into it further.

yuanying avatar Aug 22 '22 06:08 yuanying

@yuanying Hey! Thanks a lot for sharing. Still curious, but what does your workflow definition look like?

Does it look like the below?

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

Then the key takeaway here might be that the default docker context is somehow invisible to the setup-buildx-action and therefore we have to explicitly specify it via either DOCKER_CONTEXT or the endpoint option? 🤔

mumoshu avatar Aug 22 '22 06:08 mumoshu

Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can't use the workaround you mention in this post.

Here is how I use my pipeline:

    name: Build and push latest tag from devel and on new commits
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Context for Buildx
        shell: bash
        id: buildx-context
        run: |
          docker context create buildx-context || true

      - name: Use Docker Context for Buildx
        shell: bash
        id: use-buildx-context
        run: |
          docker context use buildx-context || true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          buildkitd-flags: --debug
          endpoint: buildx-context

The pipeline stuck Set up Docker Buildx

image

metabsd avatar Feb 13 '23 22:02 metabsd

Hello, First of all thank you for sharing this topic because it affects me too. I have the same problem as you but I can't use the workaround you mention in this post.

Here is how I use my pipeline:

    name: Build and push latest tag from devel and on new commits
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Context for Buildx
        shell: bash
        id: buildx-context
        run: |
          docker context create buildx-context || true

      - name: Use Docker Context for Buildx
        shell: bash
        id: use-buildx-context
        run: |
          docker context use buildx-context || true

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
        with:
          buildkitd-flags: --debug
          endpoint: buildx-context

The pipeline stuck Set up Docker Buildx

image

Exactly same setup and issue here. Did you got it to work?

philipsabri avatar Feb 28 '23 15:02 philipsabri

At this point in time, shouldn't this now be done via the buildx kubernetes driver?

My question is, what Kubernetes RBAC permissions do the self-hosted runners have by default, are they sufficient to launch builder nodes, and if not, how do we change that? @mumoshu ?

Nuru avatar Mar 17 '23 05:03 Nuru

@mumoshu #2324 fixes this - were you interested in that change?

If not, would you accept a change to the runner/entrypoint.sh to automatically automatically initialize and activate a Docker context? I think that should unblock it working with buildx but there's no need for that AND #2324, so checking in before I make another PR

milas avatar Mar 18 '23 20:03 milas