actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Rootless DIND runner set does not work following official documentation

Open shodanwashere opened this issue 1 year ago • 9 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.9.3

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Go to https://docs.github.com/en/enterprise-cloud@latest/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#example-running-dind-rootless
2. Use the supplied snippet and use the runner set chart to create a runner set using rootless DIND.
3. Set the `minRunners` to 1 so you can have one runner pod be spawned and tested on.
4. The Pod should initialize but the DIND container will fail with an Error.

Describe the bug

The runner pods don't become Ready. When analyzing the logs of the DIND container, I get errors reporting "Permissions denied".

Describe the expected behavior

I expected the runner pod to get Ready and just work with pipelines.

Additional Context

githubConfigUrl: "https://github.com/enterprises/REDACTED"

githubConfigSecret:
  github_token: "ghp_REDACTED"

maxRunners: 8

minRunners: 2

template:
  spec:
    initContainers:
    - name: init-dind-externals
      image: ghcr.io/actions/actions-runner:latest
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
      - name: dind-externals
        mountPath: /home/runner/tmpDir
      securityContext:
        runAsUser: 0
    - name: init-dind-rootless
      image: docker:dind-rootless
      command:
        - sh
        - -c
        - |
          set -x
          cp -a /etc/. /dind-etc/
          echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
          echo 'runner:x:1001:' >> /dind-etc/group
          echo 'runner:100000:65536' >> /dind-etc/subgid
          echo 'runner:100000:65536' >>  /dind-etc/subuid
          chmod 755 /dind-etc;
          chmod u=rwx,g=rx+s,o=rx /dind-home
          chown 1001:1001 /dind-home
      securityContext:
        runAsUser: 0
        privileged: true
      volumeMounts:
        - mountPath: /dind-etc
          name: dind-etc
        - mountPath: /dind-home
          name: dind-home
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      securityContext:
        privileged: true
        runAsUser: 1001
        runAsGroup: 1001
      env:
      - name: DOCKER_HOST
        value: unix:///var/run/docker.sock
      volumeMounts:
      - name: work
        mountPath: /home/runner/_work
      - name: dind-sock
        mountPath: /var/run
    - name: dind
      image: docker:dind-rootless
      args:
      - dockerd
      - --host=unix:///var/run/docker.sock
      securityContext:
        privileged: true
        runAsUser: 1001
        runAsGroup: 1001
      volumeMounts:
      - name: work
        mountPath: /home/runner/_work
      - name: dind-sock
        mountPath: /var/run
      - name: dind-externals
        mountPath: /home/runner/externals
      - name: dind-etc
        mountPath: /etc
      - name: dind-home
        mountPath: /home/runner
    volumes:
    - name: work
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "gp2"
            resources:
              requests:
                storage: 50Gi
    - name: dind-sock
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "gp2"
            resources:
              requests:
                storage: 5Gi
    - name: dind-externals
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "gp2"
            resources:
              requests:
                storage: 15Gi
    - name: dind-etc
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "gp2"
            resources:
              requests:
                storage: 5Gi
    - name: dind-home
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "gp2"
            resources:
              requests:
                storage: 20Gi
    fsGroup: 1001

Controller Logs

https://gist.github.com/shodanwashere/72412fbeb9cae702847bf00c5e5037db

Runner Pod Logs

Describe: https://gist.github.com/shodanwashere/bac9e93a65084b4bb6e9efc5325cff99
Logs (dind): https://gist.github.com/shodanwashere/ab60c36e8166d35ed5847ba81e45ca62

shodanwashere avatar Jul 05 '24 16:07 shodanwashere

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Jul 05 '24 16:07 github-actions[bot]

I'm also following the official example exactly, but I'm running into a different issue where the listener pod is stuck in a Terminating/Running loop with the following error message:

2024-07-08T13:21:21Z    INFO    listener-app    app initialized
2024-07-08T13:21:21Z    INFO    listener-app    Starting metrics server
2024-07-08T13:21:21Z    INFO    listener-app    Starting listener
2024-07-08T13:21:21Z    INFO    listener-app    refreshing token        {"githubConfigUrl": "https://github.com/myorg"}
2024-07-08T13:21:21Z    INFO    listener-app    getting access token for GitHub App auth        {"accessTokenURL": "https://api.github.com/app/installations/52488617/access_tokens"}
2024-07-08T13:21:21Z    INFO    listener-app    getting runner registration token       {"registrationTokenURL": "https://api.github.com/orgs/myorg/actions/runners/registration-token"}
2024-07-08T13:21:21Z    INFO    listener-app    getting Actions tenant URL and JWT      {"registrationURL": "https://api.github.com/actions/runner-registration"}
2024-07-08T13:21:22Z    INFO    listener-app.listener   Current runner scale set statistics.    {"statistics": "{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":0,\"totalAssignedJobs\":0,\"totalRunningJobs\":0,\"totalRegisteredRunners\":5,\"totalBusyRunners\":0,\"totalIdleRunners\":5}"}
2024-07-08T13:21:22Z    INFO    listener-app.worker.kubernetesworker    Calculated target runner count  {"assigned job": 0, "decision": 5, "min": 5, "max": 10, "currentRunnerCount": 5, "jobsCompleted": 0}
2024-07-08T13:21:22Z    INFO    listener-app.worker.kubernetesworker    Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":5,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-07-08T13:21:22Z    INFO    listener-app.worker.kubernetesworker    Preparing EphemeralRunnerSet update     {"json": "{\"spec\":{\"patchID\":0,\"replicas\":5}}"}
2024-07-08T13:21:22Z    INFO    listener-app.worker.kubernetesworker    Ephemeral runner set scaled.    {"namespace": "arc-runners", "name": "arc-runner-set-test-vlbt4", "replicas": 5}
2024-07-08T13:21:22Z    INFO    listener-app.listener   Getting next message    {"lastMessageID": 0}
2024-07-08T13:21:24Z    ERROR   listener-app    Retryable client error  {"error": "Get \"https://pipelinesghubeus8.actions.githubusercontent.com/ID1/_apis/runtime/runnerscalesets/10/messages?sessionId=ID2&api-version=6.0-preview\": context canceled", "method": "GET", "url": "https://pipelinesghubeus8.actions.githubusercontent.com/ID1/_apis/runtime/runnerscalesets/10/messages?sessionId=ID2&api-version=6.0-preview", "error": "request failed"}
github.com/actions/actions-runner-controller/github/actions.(*clientLogger).Error
        github.com/actions/actions-runner-controller/github/actions/client.go:76
github.com/hashicorp/go-retryablehttp.(*Client).Do
        github.com/hashicorp/[email protected]/client.go:718
github.com/hashicorp/go-retryablehttp.(*RoundTripper).RoundTrip
        github.com/hashicorp/[email protected]/roundtripper.go:47
net/http.send
        net/http/client.go:259
net/http.(*Client).send
        net/http/client.go:180
net/http.(*Client).do
        net/http/client.go:724
net/http.(*Client).Do
        net/http/client.go:590
github.com/actions/actions-runner-controller/github/actions.(*Client).Do
        github.com/actions/actions-runner-controller/github/actions/client.go:273
github.com/actions/actions-runner-controller/github/actions.(*Client).GetMessage
        github.com/actions/actions-runner-controller/github/actions/client.go:577
github.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).getMessage
        github.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:272
github.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).Listen
        github.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:163
github.com/actions/actions-runner-controller/cmd/ghalistener/app.(*App).Run.func1
        github.com/actions/actions-runner-controller/cmd/ghalistener/app/app.go:124
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/[email protected]/errgroup/errgroup.go:78
2024-07-08T13:21:24Z    INFO    listener-app.listener   Deleting message session
2024/07/08 13:21:24 Application returned an error: http: Server closed

Opening the link in the error shows the following:

{
    "$id": "1",
    "innerException": null,
    "message": "The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.",
    "typeName": "Microsoft.TeamFoundation.Framework.Server.UnauthorizedRequestException, Microsoft.TeamFoundation.Framework.Server",
    "typeKey": "UnauthorizedRequestException",
    "errorCode": 0,
    "eventId": 3000
}

Everything is also at 0.9.3 and the GitHub application has "Metadata:Read-only" and "Self-hosted runners:Read and write" permissions. Kubernetes is at 1.30.

umaasik avatar Jul 08 '24 13:07 umaasik

I have same issues in eks 1.27 , containerMode: type: "dind"

2024-07-09T07:38:10.595179313Z stdout F [RUNNER 2024-07-09 07:38:10Z ERR  JobDispatcher] Catch exception during renew runner jobrequest 79794.
2024-07-09T07:38:11.246446746Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher] System.TimeoutException: The HTTP request timed out after 00:01:00.
2024-07-09T07:38:11.246461895Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]  ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
2024-07-09T07:38:11.246467637Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
2024-07-09T07:38:11.246487934Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.246493242Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
2024-07-09T07:38:11.246497368Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
2024-07-09T07:38:11.246500393Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.246503597Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.246506861Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.246509682Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.246512302Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.246515406Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
2024-07-09T07:38:11.246518675Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
2024-07-09T07:38:11.246521713Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
2024-07-09T07:38:11.246530257Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.DistributedTask.WebApi.TaskAgentHttpClient.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken, Func`3 processResponse)
2024-07-09T07:38:11.246533658Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Runner.Listener.JobDispatcher.RenewJobRequestAsync(IRunnerServer runnerServer, Int32 poolId, Int64 requestId, Guid lockToken, String orchestrationId, TaskCompletionSource`1 firstJobRequestRenewed, CancellationToken token)
2024-07-09T07:38:11.246537357Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher] #####################################################
2024-07-09T07:38:11.246539905Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher] System.Threading.Tasks.TaskCanceledException: A task was canceled.
2024-07-09T07:38:11.246542555Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
2024-07-09T07:38:11.246545359Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.246556749Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
2024-07-09T07:38:11.246559539Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
2024-07-09T07:38:11.246562315Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.246565222Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  JobDispatcher]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.246568123Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO JobDispatcher] Retrying lock renewal for jobrequest 79794. Job is valid until 07/09/2024 07:45:56.
2024-07-09T07:38:11.263080643Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] Refresh JobRequest VssConnection to get on a different AFD node.
2024-07-09T07:38:11.263095261Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] EstablishVssConnection
2024-07-09T07:38:11.26309838Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] Establish connection with 30 seconds timeout.
2024-07-09T07:38:11.338495992Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO GitHubActionsService] Starting operation Location.GetConnectionData

==> test-qrg55-runner-xpbcx_gha-runner_runner-d190265e0bbcee2b286569cd99773b3fc37d5db6cd5949101cd5e0e69fa0e76c.log <==
2024-07-09T07:38:11.271259732Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] System.TimeoutException: The HTTP request timed out after 00:01:00.
2024-07-09T07:38:11.299984221Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
2024-07-09T07:38:11.299993176Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
2024-07-09T07:38:11.299995401Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.Net.Sockets.SocketException (125): Operation canceled
2024-07-09T07:38:11.299997642Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.299999709Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
2024-07-09T07:38:11.300002274Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
2024-07-09T07:38:11.300004372Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
2024-07-09T07:38:11.300006737Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
2024-07-09T07:38:11.300009195Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300011656Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.300014109Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300016727Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
2024-07-09T07:38:11.300019463Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
2024-07-09T07:38:11.300022352Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
2024-07-09T07:38:11.300044553Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300047823Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.300050151Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.300052354Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.300054895Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.300057337Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
2024-07-09T07:38:11.300059662Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
2024-07-09T07:38:11.300061921Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
2024-07-09T07:38:11.30006677Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
2024-07-09T07:38:11.300069372Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Runner.Listener.MessageListener.GetNextMessageAsync(CancellationToken token)
2024-07-09T07:38:11.300071974Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] #####################################################
2024-07-09T07:38:11.300074526Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
2024-07-09T07:38:11.300077009Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
2024-07-09T07:38:11.300079781Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.Net.Sockets.SocketException (125): Operation canceled
2024-07-09T07:38:11.300082378Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.300084805Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
2024-07-09T07:38:11.300087555Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
2024-07-09T07:38:11.300089964Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
2024-07-09T07:38:11.300094473Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
2024-07-09T07:38:11.300096911Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300099741Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.300102172Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300104583Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
2024-07-09T07:38:11.300106415Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
2024-07-09T07:38:11.30010809Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
2024-07-09T07:38:11.300109838Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.30011151Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
2024-07-09T07:38:11.300113531Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] #####################################################
2024-07-09T07:38:11.300115232Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
2024-07-09T07:38:11.300116915Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]  ---> System.Net.Sockets.SocketException (125): Operation canceled
2024-07-09T07:38:11.300118599Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    --- End of inner exception stack trace ---
2024-07-09T07:38:11.300120643Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
2024-07-09T07:38:11.300134146Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
2024-07-09T07:38:11.300137117Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
2024-07-09T07:38:11.300139457Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
2024-07-09T07:38:11.300141325Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener]    at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2024-07-09T07:38:11.300145483Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] #####################################################
2024-07-09T07:38:11.30014886Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  MessageListener] System.Net.Sockets.SocketException (125): Operation canceled
2024-07-09T07:38:11.300150535Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO MessageListener] Retriable exception: The HTTP request timed out after 00:01:00.
2024-07-09T07:38:11.311101566Z stdout F [RUNNER 2024-07-09 07:38:11Z ERR  Terminal] WRITE ERROR: 2024-07-09 07:38:11Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2024-07-09T07:38:11.331699233Z stderr F 2024-07-09 07:38:11Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2024-07-09T07:38:11.334061151Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] Refresh MessageQueue VssConnection to get on a different AFD node.
2024-07-09T07:38:11.334073685Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] EstablishVssConnection
2024-07-09T07:38:11.334077469Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO RunnerServer] Establish connection with 60 seconds timeout.
2024-07-09T07:38:11.338946708Z stdout F [RUNNER 2024-07-09 07:38:11Z INFO GitHubActionsService] Starting operation Location.GetConnectionData

noamgreen avatar Jul 09 '24 07:07 noamgreen

Same issue here on k9s v1.29.2, running 0.9.3 showing: "The user 'System:PublicAccess;aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource."

this leads to the Runner stop working after some time

steffenmllr avatar Jul 11 '24 09:07 steffenmllr

@nikola-jokic Any thoughts here? We recently updated to ARC 0.9.3 and have been hit with listener restarts and the above messaging in the runners as well. FWIW this started when we updated to 0.9.3 and we did not observe this behavior in 0.7.0. The error messaging appears to be intermittent and lasts ~15 minutes.

EKS Version 1.29.0 GHES Version: 3.9.15 ARC Version: 0.9.3 Runner Version: 2.317.0

Runner exception:

[RUNNER 2024-07-15 06:11:33Z ERR  MessageListener] Catch exception during get next message.
[RUNNER 2024-07-15 06:11:33Z ERR  MessageListener] System.TimeoutException: The HTTP request timed out after 00:01:00.
[RUNNER 2024-07-15 06:11:33Z ERR  MessageListener]  ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
[RUNNER 2024-07-15 06:11:33Z ERR  MessageListener]  ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
[RUNNER 2024-07-15 06:11:33Z ERR  MessageListener]  ---> System.Net.Sockets.SocketException (125): Operation canceled

Listener Exception:

{
  "error": "Get \"https://<GHES URL>/_services/pipelines/<UID>/_apis/runtime/runnerscalesets/8/messages?sessionId=<UID>&api-version=6.0-preview\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
  "method": "GET",
  "url": "https://<GHES URL>/_services/pipelines/<UID>/_apis/runtime/runnerscalesets/8/messages?sessionId=<UID>&api-version=6.0-preview"
}

Listener stack trace:

github.com/actions/actions-runner-controller/github/actions.(*clientLogger).Error
	github.com/actions/actions-runner-controller/github/actions/client.go:76
github.com/hashicorp/go-retryablehttp.(*Client).Do
	github.com/hashicorp/[email protected]/client.go:718
github.com/hashicorp/go-retryablehttp.(*RoundTripper).RoundTrip
	github.com/hashicorp/[email protected]/roundtripper.go:47
net/http.send
	net/http/client.go:259
net/http.(*Client).send
	net/http/client.go:180
net/http.(*Client).do
	net/http/client.go:724
net/http.(*Client).Do
	net/http/client.go:590
github.com/actions/actions-runner-controller/github/actions.(*Client).Do
	github.com/actions/actions-runner-controller/github/actions/client.go:273
github.com/actions/actions-runner-controller/github/actions.(*Client).GetMessage
	github.com/actions/actions-runner-controller/github/actions/client.go:577
github.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).getMessage
	github.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:272
github.com/actions/actions-runner-controller/cmd/ghalistener/listener.(*Listener).Listen
	github.com/actions/actions-runner-controller/cmd/ghalistener/listener/listener.go:163
github.com/actions/actions-runner-controller/cmd/ghalistener/app.(*App).Run.func1
	github.com/actions/actions-runner-controller/cmd/ghalistener/app/app.go:124
golang.org/x/sync/errgroup.(*Group).Go.func1
	golang.org/x/[email protected]/errgroup/errgroup.go:78

jb-2020 avatar Jul 15 '24 17:07 jb-2020

Thanks to everyone who has also brought up these issues. I'd like to clarify, however, that these issues we've had in 0.9.3 have also been present in prior versions of the GHARC, at least since version 0.9.1 by my testing.

shodanwashere avatar Jul 23 '24 14:07 shodanwashere

Getting the same issue in k3s cluster

gregkonush avatar Aug 23 '24 06:08 gregkonush

Bumping this as I'm getting the same exact issue @jb-2020 has on EKS. Do we need to use previous versions?

CodechCFA avatar Sep 25 '24 14:09 CodechCFA

we tried our own luck with AKS and it never worked and then switched to Buildah rooless dind which also didn't worked. Hoping to find a solution for this issue.

sudhakarinka avatar Oct 01 '24 13:10 sudhakarinka

Same issue here, I added the template.spec snippet recommended here to try and set requests/limits for the runner container, but the same error reported above happens. When reverting back to containerMode.type: "dind" the listener starts working again.

fbscarel-bl avatar Nov 17 '24 15:11 fbscarel-bl

Hello, same problem here. Any news?

tarrinho avatar Jan 16 '25 17:01 tarrinho

Closing since the documentation has been updated. Thank you for reporting this issue!

nikola-jokic avatar Mar 18 '25 16:03 nikola-jokic