Self-Hosted Runner does not pick up queued jobs
Describe the bug I have a workflow with various jobs that have to be ran one by one by the runner. But the runner only picks up the first one or two jobs and doesn't pick up the rest of the jobs in the queue. It enters into an idle state when there are multiple jobs in the queue, the only way for the runner to pick them up and finish the workflow is if I manually cancel and re-run the failed jobs.
This was not happening a couple days ago, it just randomly started without any significant modification to the workflow or runner.
Expected behavior The self-hosted runner should finish all the jobs in the queue.
Runner Version and Platform
Version of your runner? 2.319.1
OS of the machine running the runner? OSX/Windows/Linux/... Linux Ubuntu 24.04.1 LTS
I am having the same issue, self hosted runner is not picking up queued tasks: version:2.319.1 Linux Arm64 It shows up as idle on GitHub org level runners page, tried to recreate the runner multiple times, still getting issue.
Have you tried restart listener ? Is it work? I do seems have the same issue. Actions did not pick up the job, but after restart listener, Actions picked up the jobs.
It's not long term solution but have no idea how to solve this issue yet. Seems Actions has this issue from several previous version until latest version.
@hikouki-gumo Restarting the GitHub action runner did not solve the problem for me. My GitHub action runner was made at an organization level, but what worked was creating a repo level GitHub action runner. This repo level runner was able to pick up jobs. Not sure why the org level runner broke.
To create repo level GitHub action runner: MyRepo -> Settings -> Actions ->Runners [click New self-hosted runner]
I found one resolution for my issue.
By default the self-hosted runner was being created at ORG level. I had to delete the same and create one using repo settings.
Thank you!! After hours of troubleshooting, removing the runner from ORG settings and adding it from REPO settings worked charms. Huge appreciation.
We had the similar issue. In our case, we can't have GitHub runners are repository level, we need provision at the Org Level
We are running Self Hosted Github runner on Kubernetes using Scale Set Runners
We had issue where Scale Set runner wasn't scaling up with the demand of the workflow run. We had 50 workflow runs pending for a particular scale set listener and it was running only 10-15 runs while the other 35-40 runners were sitting idle.
When we started troubleshooting, we found the following container logs from one of the runner pod
RUNNER 2024-11-14 13:57:12Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown │
│ RUNNER 2024-11-14 13:57:13Z ERR GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=5244b719-b408-4da3-993c-c03a3fb81b1 │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] Catch exception during request │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] System.Exception: Failed to get job message. Request to https://broker.actions.githubusercontent.com/message failed with │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext() │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] --- End of stack trace from previous location --- │
│ RUNNER 2024-11-14 13:57:13Z ERR BrokerServer] at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxRetry │
│ RUNNER 2024-11-14 13:57:13Z WARN BrokerServer] Back off 12.599 seconds before next retry. 4 attempt left. │
│ RUNNER 2024-11-14 13:57:25Z ERR GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=5244b719-b408-4da3-993c-c03a3fb81b1 │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] Catch exception during request │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] System.Exception: Failed to get job message. Request to https://broker.actions.githubusercontent.com/message failed with │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext() │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] --- End of stack trace from previous location --- │
│ RUNNER 2024-11-14 13:57:25Z ERR BrokerServer] at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxRetry │
│ RUNNER 2024-11-14 13:57:25Z WARN BrokerServer] Back off 7.012 seconds before next retry. 3 attempt left. │
│ RUNNER 2024-11-14 13:57:33Z ERR GitHubActionsService] GET request to https://broker.actions.githubusercontent.com/message?sessionId=5244b719-b408-4da3-993c-c03a3fb81b1 │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] Catch exception during request │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] System.Exception: Failed to get job message. Request to https://broker.actions.githubusercontent.com/message failed with │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] at GitHub.Actions.RunService.WebApi.BrokerHttpClient.GetRunnerMessageAsync(Nullable`1 sessionId, String runnerVersion, │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] at GitHub.Runner.Common.BrokerServer.<>c__DisplayClass7_0.<<GetRunnerMessageAsync>b__0>d.MoveNext() │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] --- End of stack trace from previous location --- │
│ RUNNER 2024-11-14 13:57:33Z ERR BrokerServer] at GitHub.Runner.Common.RunnerService.RetryRequest[T](Func`1 func, CancellationToken cancellationToken, Int32 maxRetry │
│ RUNNER 2024-11-14 13:57:33Z WARN BrokerServer] Back off 9.356 seconds before next retry. 2 attempt left. │
│ RUNNER 2024-11-14 13:58:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages... │
│ RUNNER 2024-11-14 13:59:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages... │
│ RUNNER 2024-11-14 14:00:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages... │
Many of the pods that were sitting idle when we had demand for it has the following logs
[2024-11-14 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-11-14 11:43:31Z INFO MessageListener] Session created.
[2024-11-14 11:43:31Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-11-14 11:43:31Z INFO Terminal] WRITE LINE: 2024-11-14 11:43:31Z: Listening for Jobs
[2024-11-14 11:43:31Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-11-14 11:43:31Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:43:31Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-11-14 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-11-14 11:44:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:45:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:46:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:46:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:47:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:48:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:49:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:50:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-11-14 11:51:04Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
This Actionset controller is emitting metrics that we scrape in our Prometheus server. This is what we observe from the Grafana Dashboard
There is a gap between the assigned job and Kubernetes pods.
We experience same issues. 2 jobs in a workflow, second job always runs into pending forever
Same issue. I have to rerun & immediately cancel some ancient old runs so that the recent/actual queued tasks actually start doing their job and not hang waiting for a runner to pickup job. Moving runner from org to repo-level has not fixed a thing, and its still happening almost on every single action.
This issue started again from yesterday without any changes from ourside.
Randomly after runnign few jobs, the runner remains idle - but nothing gets assigned to the runner.
We have to manually kill ./run.sh and run it again and jobs start.
Is there any options to see detailed logs of run.sh ?
Same issue here... there are any update?
Same for me. It seems to be intermittent. It queues until I reboot the server, and then it only picks up a few and then gets stuck queuing again. Screenshots show before and after rebooting the server
Same here, and seems to be happening after running 1 or 2 jobs. Only a service restart fixes the issue (using repo runners).
My problem is only with runner inside docker
For me it runs a workflow (one or more jobs) and that's it, and will only pick up a new job if a job is already queuing while I restart the runner. I'm using the official docker image and the compose setup described here: https://github.com/dorinclisu/devops-setup/tree/main/github_runner
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO ProcessInvokerWrapper] Finished process 35 with exit code 100, and elapsed time 00:00:10.6715035.
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobDispatcher] Worker finished for job 222fe9af-eb80-5509-9f7a-53ef6b651d8a. Code: 100
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobDispatcher] finish job request for job 222fe9af-eb80-5509-9f7a-53ef6b651d8a with result: Succeeded
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO Terminal] WRITE LINE: 2025-03-13 05:53:04Z: Job test completed with result: Succeeded
github-runner-1 | 2025-03-13 05:53:04Z: Job test completed with result: Succeeded
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobDispatcher] Stop renew job request for job 222fe9af-eb80-5509-9f7a-53ef6b651d8a.
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobDispatcher] job renew has been cancelled, stop renew job 222fe9af-eb80-5509-9f7a-53ef6b651d8a.
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobNotification] Entering JobCompleted Notification
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO JobNotification] Entering EndMonitor
github-runner-1 | [RUNNER 2025-03-13 05:53:04Z INFO MessageListener] Received job status event. JobState: Online
I don't have this issue with the runner installed on the host as per instructions in the Actions console:
√ Connected to GitHub
2025-03-13 10:05:15Z: Running job: test
2025-03-13 10:06:04Z: Job test completed with result: Canceled
2025-03-13 10:09:23Z: Running job: test
2025-03-13 10:10:34Z: Job test completed with result: Succeeded
2025-03-13 10:11:32Z: Running job: build-test
2025-03-13 10:12:25Z: Job build-test completed with result: Succeeded
2025-03-13 10:12:32Z: Running job: build-push
2025-03-13 10:19:27Z: Job build-push completed with result: Succeeded
I have the same issue. I uninstalled the services, reinstalled them, registered the runner again, and also changed the labels, but no luck. Every time, I need to restart for it to pick up the job. This has been happening for the last two weeks; otherwise, it has worked well for three years. Do we have a solution? I’m not keeping track of who made changes, and I have to restart the service. Thank you.
Same error for me. After 1-2 jobs it doesn't pickup it again. Runner on Self Hosted - Hetzner ARM64
Any Update for this? I have to press enter in the cmd box to run for new jobs all the time. If I restart the run.cmd, still having the same problems after picking up 1 to 2 jobs.
+1
Also experiencing this issue at the moment. It seems to be related to this incident from March 8th, that is seemingly resolved.
Similar issue with similar recent complaints: https://github.com/orgs/community/discussions/120813#discussioncomment-12435979
Does anyone from GH monitor these issues?
GH has apparently fixed this issue (can confirm on our side), here's their comment: https://github.com/actions/runner/issues/3609#issuecomment-2722340062
I haven't had any issues after the fix. Seems to be fixed ^^
The same issue started for us yesterday. No errors in ARC controller or listener logs. Everything appears fine except for the fact that the queued job doesn't start.
k get ephemeralrunner
NAME GITHUB CONFIG URL RUNNERID STATUS JOBREPOSITORY JOBWORKFLOWREF AGE
nonprod-hpzjx-runner-6fqkj https://github.com/enterprises/<name> 333337 Running org/repo org/repo/.github/workflows/workflow_name.yml@refs/pull/9048/merge 77s
nonprod-hpzjx-runner-mh5km https://github.com/enterprises/<name> 332555 Running org/repo org/repo/.github/workflows/Pull_Request.yml@refs/pull/15174/merge 45h
nonprod-hpzjx-runner-qrg7c https://github.com/enterprises/<name> 351555 Running org/repo org/repo/.github/workflows/workflow_name.yml@refs/pull/9070/merge 22m
nonprod-hpzjx-runner-tdkr4 https://github.com/enterprises/<name> 355555 Running org/repo org/repo/.github/workflows/workflow_name.yml@refs/pull/9064/merge 45h
Looking at ephemeral runners above, AGE suggests job was supposed to serve 45 hour ago, but in background on Github Action workflow keeps loading forever/stuck after picking up the jobs from queue.
No errors in ARC controller or listener logs as stated in above comment.
using latest runner v2.325.0 arc runner set version - 0.11.0
Same issue on our end. The self-hosted runners shows status Idle, but the actions are all Queued.