Runner stuck at `BrokerMigration message received. Polling Broker for messages...`
Describe the bug As of ~last night (4th of July CEST) self hosted runners do not pick up any new jobs. Runners do emit a message ~every minute:
Runner stuck at `BrokerMigration message received. Polling Broker for messages...`
This is similar to what's reported here: https://github.com/ChristopherHX/github-act-runner/issues/186#issuecomment-2208392039.
Expected behavior Runners should pick up jobs
Runner Version and Platform
Runner
v2.317.0
OS:
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
Runner and Worker's Diagnostic Logs
[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:43:31Z INFO MessageListener] Session created.
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: 2024-07-04 11:43:31Z: Listening for Jobs
[2024-07-04 11:43:31Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-04 11:43:31Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:43:31Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:44:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:45:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:47:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:48:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:49:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:50:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:51:04Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
Question
Is there anything we can do to get more debug information out of the runners?
We're seeing the same thing on our MacOS self-hosted runner, also as of that point in time.
EDIT: Also on v2.317.0
It looks like the BrokerMigration message has stopped.
But our runners still aren't picking up jobs.
Now logs keep repeating:
[2024-07-05 09:26:16Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 09:56:20Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:16:23Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 10:16:23Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 10:26:25Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:56:29Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 11:11:32Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 11:11:32Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
This is using v2.317.0
@leonschoorl Do you have an example run that's not being picked up?
https://github.com/bittide/bittide-hardware/actions/runs/9798992881
This is the runner page:
(I'm a colleague of @leonschoorl)
Hey @martijnbastiaan I noticed your runners no longer have the compute label, so those jobs won't match. Can you try adding those runners with the compute label or changing the workflow to only require self-hosted?
@benjyw You mentioned having the same issue, do you have any more logs or run links you can share?
Oh wow, yeah I totally forgot to apply the labels when reinstalling the runners to see if that would fix this issue. Re-adding the labels works.
For the earlier issue, @martijnbastiaan, you weren't getting jobs assigned when you were receiving the BrokerMigration message. Is your runner the standard actions/runner (this repo, not a fork)? Do your runners have any kind if firewall settings that might be preventing it from hitting the Broker? Trying to make sure I'm capturing the original issue too.
Standard runner, not a fork. No firewall except NAT.
As of yesterday my self hosted runner is also not picking up any new jobs, latest _diag message: https://gist.github.com/matbee-eth/b3c2bdca9f560b9b9788d1b1fbedef02
<trunc>
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: 2024-07-05 16:12:02Z: Listening for Jobs
[2024-07-05 16:12:02Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-05 16:12:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:12:02Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/fatimaapi/backend/.credentials_rsaparams
[2024-07-05 16:12:02Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 16:12:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:13:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:14:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:15:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:16:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:17:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
@matbee-eth do you have a session ID for your runner? Or an example link to a run that isn't being picked up?
It's a private repo, not sure if GH staff can access anything, but https://github.com/FytechCorp/AI-Backend/actions/runs/9799514721
Unsure of how to get a Runner Session ID specifically though
All good @matbee-eth. That's helpful. Thanks!
Hi @luketomlinson, thanks for looking into this. Things seem to be working now, with no changes on our end. So presumably some server-side issue that was resolved?
Yep @benjyw, I'm looking into some server side changes.
Hi @luketomlinson,
Any updates regarding this issue? We faced it as well. Original runner with no special firewall rules. If you need any additional information, I will be glad to provide it.
We have seen this issue as well. Same as for the other people here: Started about 2 weeks ago around the 4th of July. Then it was working again (albeit a bit spotty), but since yesterday complete outage of all our self-hosted runners on both Windows as well as Linux. This is really making me regret using Github Actions as a CI/CD pipeline right now. Identical log as for the previous people as well. For example:
[2024-07-17 13:07:35Z INFO Terminal] WRITE LINE: 2024-07-17 13:07:35Z: Listening for Jobs
[2024-07-17 13:07:35Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-17 13:07:35Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:07:35Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 13:07:35Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 13:08:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:09:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:06Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:11:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:12:38Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:13:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:14:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:16:49Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:17:39Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:18:30Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:19:20Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:20:10Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:01Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:51Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:22:41Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:23:32Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:24:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:25:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:27:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:29:24Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:30:15Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:05Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:55Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:32:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:33:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:34:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:35:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
@fwosar can you share the full runner diag log with me? [email protected] 🙇
Sorry for the trouble, I can help you take a look when i got the diag log. 😄
Actually, I think this particular problem might be related to #3393. I went through all the logs again, and we hit the same error as in #3393 on our runners:
[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 12:59:04Z INFO JobDispatcher] Shutting down JobDispatcher. Make sure all WorkerDispatcher has finished.
[2024-07-17 12:59:04Z INFO Runner] Deleting Runner Session...
[2024-07-17 12:59:05Z ERR Terminal] WRITE ERROR: An error occurred: Object reference not set to an instance of an object.
[2024-07-17 12:59:05Z ERR Listener] System.NullReferenceException: Object reference not set to an instance of an object.
at GitHub.Runner.Listener.JobDispatcher.Run(AgentJobRequestMessage jobRequestMessage, Boolean runOnce)
at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
at GitHub.Runner.Listener.Runner.ExecuteCommand(CommandSettings command)
at GitHub.Runner.Listener.Program.MainAsync(IHostContext context, String[] args)
After hitting these errors, the runner was effectively dead. Removing and redeploying the runner seems to have resolved the problem.
We are experiencing this behavior now:
[RUNNER 2024-07-25 15:15:45Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[RUNNER 2024-07-25 15:15:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:15:45Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:15:45Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:16:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:17:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:18:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:07Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:20:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:21:37Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:22:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:23:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:54Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:27:44Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
...
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z ERR GitHubActionsService] POST request to https://pipelinesghubeus21.actions.githubusercontent.com/izjmlKLllI8mQvN3RNUEys667oyyTtmTTtNj33gFgdUfkp3yzp/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2024-07-25 15:30:22Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] Catch exception during create session.
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration c8051515-ad19-4910-a34e-36559f1f57d8 was not found.
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR MessageListener] at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)
[RUNNER 2024-07-25 15:30:22Z ERR Terminal] WRITE ERROR: Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
[RUNNER 2024-07-25 15:30:22Z INFO Listener] Runner execution has finished with return code 1
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...
In my case, after adding a self-hosted runner from the GitHub Enterprise Settings, I encountered the same issue as described. However, when I created the self-hosted runner from the GitHub Organization Settings, it worked correctly.
@fwosar can you share the full runner diag log with me? [email protected] 🙇 Sorry for the trouble, I can help you take a look when i got the diag log. 😄
Hi @TingluoHuang we're also having the issue, can we send the log your way too?
I have also faced this issue. It occurs sometimes (doesn't happen everytime) when I create a new arc set.
The way I could do a workaround for this (I don't know if this is a valid solution) is to:
- create arc set with min replicas = 1
- Monitor this replicas log to see the above log.
- Manually start a job to see if it gets picked up.
- If it doesn't get picked up in 5 mins, restart the replica.
- Cancel the job and queue another job.
- That job and subsequent jobs get picked up.
I don't know if this is a initial setup problem. But doing this doesn't cause issues for queueing up the jobs. I have tested by keeping this setup running over night and submitting jobs every 3-4 hours.