runner icon indicating copy to clipboard operation
runner copied to clipboard

Runner stuck at `BrokerMigration message received. Polling Broker for messages...`

Open martijnbastiaan opened this issue 1 year ago • 21 comments

Describe the bug As of ~last night (4th of July CEST) self hosted runners do not pick up any new jobs. Runners do emit a message ~every minute:

Runner stuck at `BrokerMigration message received. Polling Broker for messages...`

This is similar to what's reported here: https://github.com/ChristopherHX/github-act-runner/issues/186#issuecomment-2208392039.

Expected behavior Runners should pick up jobs

Runner Version and Platform

Runner

v2.317.0

OS:

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Runner and Worker's Diagnostic Logs

[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:43:31Z INFO MessageListener] Session created.
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-04 11:43:31Z INFO Terminal] WRITE LINE: 2024-07-04 11:43:31Z: Listening for Jobs
[2024-07-04 11:43:31Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-04 11:43:31Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:43:31Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-04 11:43:31Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-04 11:44:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:45:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:46:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:47:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:48:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:49:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:50:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-04 11:51:04Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

Question

Is there anything we can do to get more debug information out of the runners?

martijnbastiaan avatar Jul 04 '24 11:07 martijnbastiaan

We're seeing the same thing on our MacOS self-hosted runner, also as of that point in time.

EDIT: Also on v2.317.0

benjyw avatar Jul 05 '24 06:07 benjyw

It looks like the BrokerMigration message has stopped. But our runners still aren't picking up jobs.

Now logs keep repeating:

[2024-07-05 09:26:16Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 09:56:20Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:16:23Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 10:16:23Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 10:26:25Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 10:56:29Z INFO MessageListener] No message retrieved from session '33f227ee-3ebb-4112-b506-16f7433b4226' within last 30 minutes.
[2024-07-05 11:11:32Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/bittide/Downloads/actions-runner-1/.credentials_rsaparams
[2024-07-05 11:11:32Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown

This is using v2.317.0

leonschoorl avatar Jul 05 '24 12:07 leonschoorl

@leonschoorl Do you have an example run that's not being picked up?

luketomlinson avatar Jul 05 '24 14:07 luketomlinson

https://github.com/bittide/bittide-hardware/actions/runs/9798992881

This is the runner page:

Screenshot 2024-07-05 a

(I'm a colleague of @leonschoorl)

martijnbastiaan avatar Jul 05 '24 14:07 martijnbastiaan

Hey @martijnbastiaan I noticed your runners no longer have the compute label, so those jobs won't match. Can you try adding those runners with the compute label or changing the workflow to only require self-hosted?

luketomlinson avatar Jul 05 '24 15:07 luketomlinson

@benjyw You mentioned having the same issue, do you have any more logs or run links you can share?

luketomlinson avatar Jul 05 '24 15:07 luketomlinson

Oh wow, yeah I totally forgot to apply the labels when reinstalling the runners to see if that would fix this issue. Re-adding the labels works.

martijnbastiaan avatar Jul 05 '24 15:07 martijnbastiaan

For the earlier issue, @martijnbastiaan, you weren't getting jobs assigned when you were receiving the BrokerMigration message. Is your runner the standard actions/runner (this repo, not a fork)? Do your runners have any kind if firewall settings that might be preventing it from hitting the Broker? Trying to make sure I'm capturing the original issue too.

luketomlinson avatar Jul 05 '24 15:07 luketomlinson

Standard runner, not a fork. No firewall except NAT.

martijnbastiaan avatar Jul 05 '24 16:07 martijnbastiaan

As of yesterday my self hosted runner is also not picking up any new jobs, latest _diag message: https://gist.github.com/matbee-eth/b3c2bdca9f560b9b9788d1b1fbedef02

<trunc>
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: Current runner version: '2.317.0'
[2024-07-05 16:12:02Z INFO Terminal] WRITE LINE: 2024-07-05 16:12:02Z: Listening for Jobs
[2024-07-05 16:12:02Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-05 16:12:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:12:02Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/fatimaapi/backend/.credentials_rsaparams
[2024-07-05 16:12:02Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-05 16:12:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:13:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:14:33Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:15:23Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:16:13Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-05 16:17:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

matbeedotcom avatar Jul 05 '24 16:07 matbeedotcom

@matbee-eth do you have a session ID for your runner? Or an example link to a run that isn't being picked up?

luketomlinson avatar Jul 05 '24 16:07 luketomlinson

It's a private repo, not sure if GH staff can access anything, but https://github.com/FytechCorp/AI-Backend/actions/runs/9799514721

Unsure of how to get a Runner Session ID specifically though

matbeedotcom avatar Jul 05 '24 16:07 matbeedotcom

All good @matbee-eth. That's helpful. Thanks!

luketomlinson avatar Jul 05 '24 16:07 luketomlinson

Hi @luketomlinson, thanks for looking into this. Things seem to be working now, with no changes on our end. So presumably some server-side issue that was resolved?

benjyw avatar Jul 05 '24 21:07 benjyw

Yep @benjyw, I'm looking into some server side changes.

luketomlinson avatar Jul 05 '24 21:07 luketomlinson

Hi @luketomlinson,

Any updates regarding this issue? We faced it as well. Original runner with no special firewall rules. If you need any additional information, I will be glad to provide it.

ajax-ryzhyi-r avatar Jul 16 '24 09:07 ajax-ryzhyi-r

We have seen this issue as well. Same as for the other people here: Started about 2 weeks ago around the 4th of July. Then it was working again (albeit a bit spotty), but since yesterday complete outage of all our self-hosted runners on both Windows as well as Linux. This is really making me regret using Github Actions as a CI/CD pipeline right now. Identical log as for the previous people as well. For example:

[2024-07-17 13:07:35Z INFO Terminal] WRITE LINE: 2024-07-17 13:07:35Z: Listening for Jobs
[2024-07-17 13:07:35Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[2024-07-17 13:07:35Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:07:35Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 13:07:35Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 13:08:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:09:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:06Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:10:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:11:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:12:38Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:13:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:14:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:15:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:16:49Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:17:39Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:18:30Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:19:20Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:20:10Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:01Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:21:51Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:22:41Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:23:32Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:24:22Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:25:12Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:02Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:26:53Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:27:43Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:29:24Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:30:15Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:05Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:31:55Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:32:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:33:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:34:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-07-17 13:35:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

fwosar avatar Jul 17 '24 19:07 fwosar

@fwosar can you share the full runner diag log with me? [email protected] 🙇 Sorry for the trouble, I can help you take a look when i got the diag log. 😄

TingluoHuang avatar Jul 17 '24 21:07 TingluoHuang

Actually, I think this particular problem might be related to #3393. I went through all the logs again, and we hit the same error as in #3393 on our runners:

[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO RSAEncryptedFileKeyManager] Loading RSA key parameters from file C:\Buildserver\actions-runner\.credentials_rsaparams
[2024-07-17 12:59:03Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-07-17 12:59:04Z INFO JobDispatcher] Shutting down JobDispatcher. Make sure all WorkerDispatcher has finished.
[2024-07-17 12:59:04Z INFO Runner] Deleting Runner Session...
[2024-07-17 12:59:05Z ERR  Terminal] WRITE ERROR: An error occurred: Object reference not set to an instance of an object.
[2024-07-17 12:59:05Z ERR  Listener] System.NullReferenceException: Object reference not set to an instance of an object.
   at GitHub.Runner.Listener.JobDispatcher.Run(AgentJobRequestMessage jobRequestMessage, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.RunAsync(RunnerSettings settings, Boolean runOnce)
   at GitHub.Runner.Listener.Runner.ExecuteCommand(CommandSettings command)
   at GitHub.Runner.Listener.Program.MainAsync(IHostContext context, String[] args)

After hitting these errors, the runner was effectively dead. Removing and redeploying the runner seems to have resolved the problem.

fwosar avatar Jul 18 '24 08:07 fwosar

We are experiencing this behavior now:

[RUNNER 2024-07-25 15:15:45Z INFO JobDispatcher] Set runner/worker IPC timeout to 30 seconds.
[RUNNER 2024-07-25 15:15:45Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:15:45Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:15:45Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:16:36Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:17:26Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:18:16Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:07Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:19:57Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:20:47Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:21:37Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:22:28Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:23:18Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:08Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:24:59Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:03Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:26:54Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:27:44Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[RUNNER 2024-07-25 15:28:34Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
...
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2024-07-25 15:30:22Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus21.actions.githubusercontent.com/izjmlKLllI8mQvN3RNUEys667oyyTtmTTtNj33gFgdUfkp3yzp/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2024-07-25 15:30:22Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener] Catch exception during create session.
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration c8051515-ad19-4910-a34e-36559f1f57d8 was not found.
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
[RUNNER 2024-07-25 15:30:22Z ERR  MessageListener]    at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)
[RUNNER 2024-07-25 15:30:22Z ERR  Terminal] WRITE ERROR: Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
[RUNNER 2024-07-25 15:30:22Z INFO Listener] Runner execution has finished with return code 1
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...

marko-k0 avatar Jul 25 '24 15:07 marko-k0

In my case, after adding a self-hosted runner from the GitHub Enterprise Settings, I encountered the same issue as described. However, when I created the self-hosted runner from the GitHub Organization Settings, it worked correctly.

sby-yu-katou avatar Aug 01 '24 05:08 sby-yu-katou

@fwosar can you share the full runner diag log with me? [email protected] 🙇 Sorry for the trouble, I can help you take a look when i got the diag log. 😄

Hi @TingluoHuang we're also having the issue, can we send the log your way too?

knocte avatar Oct 30 '24 09:10 knocte

I have also faced this issue. It occurs sometimes (doesn't happen everytime) when I create a new arc set.

The way I could do a workaround for this (I don't know if this is a valid solution) is to:

  1. create arc set with min replicas = 1
  2. Monitor this replicas log to see the above log.
  3. Manually start a job to see if it gets picked up.
  4. If it doesn't get picked up in 5 mins, restart the replica.
  5. Cancel the job and queue another job.
  6. That job and subsequent jobs get picked up.

I don't know if this is a initial setup problem. But doing this doesn't cause issues for queueing up the jobs. I have tested by keeping this setup running over night and submitting jobs every 3-4 hours.

saurabh-hirani avatar Nov 14 '24 10:11 saurabh-hirani