runner icon indicating copy to clipboard operation
runner copied to clipboard

Incorrect self-hosted runners disappeared message

Open samdeane opened this issue 2 years ago • 6 comments

Describe the bug

I have a runner running in a virtualised mac - which is generally working very well.

Occasionally the VM instance is going to sleep (or the host that it's running on is going to sleep), and the run.sh script is losing its connection or timing out and attempting to retry.

It doesn't manage to retry when the session is restored. If I ctrl-c the script and launch it again, I get the error about the self-hosted runner having disappeared.

This error is spurious in this case. The runner is still registered, and if I restart the VM and re-run the script it works again.

I think the real reason is some sort of token / auth timeout, which is incorrectly being reported.

To Reproduce

Install runner in a VM (eg VirtualBuddy using macOS 13.6) Run the runner and leave it for a while. When it times out, try to restart the run.sh script.

Expected behavior

Ideally the run.sh script should successfully reconnect without needing to be restarted. In the absence of that, quitting and re-running the script should successfully connect. In the absence of that (!), a more informative message should be output, explaining the real problem.

Runner Version and Platform

runner: 2.311.0 virtualized host: macos 13.6 host: macos 13.4 (also tested on 13.6)

What's not working?

Runner is failing to reconnect, and is reporting that it's been deleted when it hasn't.

Job Log Output

runner@Runners-Virtual-Machine actions-runner % ./run.sh

√ Connected to GitHub

Current runner version: '2.311.0'
2023-11-02 07:18:17Z: Listening for Jobs
2023-11-02 08:17:44Z: Running job: Deploy Client
2023-11-02 08:18:38Z: Job Deploy Client completed with result: Failed
2023-11-02 09:59:39Z: Running job: Deploy Client
2023-11-02 10:04:05Z: Job Deploy Client completed with result: Failed
2023-11-02 10:06:03Z: Running job: Deploy Client
2023-11-02 10:07:54Z: Runner connect error: The HTTP request timed out after 00:01:00.. Retrying until reconnected.
2023-11-02 10:08:35Z: Job Deploy Client completed with result: Abandoned
^CExiting...
An error occurred: The token expired on 11/02/2023 10:42:58. Current server time is 11/02/2023 15:00:30.
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Restarting runner...

√ Connected to GitHub

Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...
runner@Runners-Virtual-Machine actions-runner % ./run.sh

√ Connected to GitHub

Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.
Runner listener exit with terminated error, stop the service, no retry needed.
Exiting runner...
runner@Runners-Virtual-Machine actions-runner % 

Runner and Worker's Diagnostic Logs

Runner_20231102-071814-utc.log Worker_20231102-081745-utc.log

samdeane avatar Nov 02 '23 07:11 samdeane

running into the same issue with arc managed runners, version v2.315

cheskayang avatar Apr 18 '24 23:04 cheskayang

I'm deploying the runners to Kubernetes ( EKS ) through Helm, using gha-runner-scale-set-controller chart version 0.9.2. We have a workflow that uses a matrix to create around 60 runners to build different images.

I am experiencing the issue randomly for some of the runners, even though they register successfully to GHA

√ Connected to GitHub

it fails seconds later while executing the job with the following errors

[RUNNER 2024-06-26 14:58:17Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus3.actions.githubusercontent.com/EMdYJ0e8OZvrY3kdHoarqQa5vRx5ItkhbERYlQ35U6MxWjS0e4/_apis/oauth2/token failed. HTTP Status: BadRequest
[RUNNER 2024-06-26 14:58:17Z ERR  Terminal] WRITE ERROR: Failed to create a session. The runner registration has been deleted from the server, please re-configure. Runner registrations are automatically deleted for runners that have not connected to the service recently.

GHA is somehow deleting the runner registration, making the job fail with Error: The operation was canceled.

I have already read a bunch of issues opened by people experiencing similar problems, but GH is not putting much effort in their resolution. I'm starting to think that they just want people use the GH managed runners, but that is not possible for security reasons, depending on the infrastructure, and we are already paying for the service.

luismiguelsaez-steercrm avatar Jun 26 '24 17:06 luismiguelsaez-steercrm

Facing something similar. Any news on this?

celiogafesi avatar Oct 01 '24 07:10 celiogafesi

@celiogafesi Earlier issues should be resolved now. Can you runner logs of the runner that got deleted?

lokesh755 avatar Oct 01 '24 13:10 lokesh755