ec2-github-runner icon indicating copy to clipboard operation
ec2-github-runner copied to clipboard

30% of runner instantiation failes due to timeout

Open jappyjan opened this issue 3 years ago • 7 comments

Run machulav/ec2-github-runner@v2
GitHub Registration Token is received
AWS EC2 instance i-0eeae9ef28dcd04e9 is started
AWS EC2 instance i-0eeae9ef28dcd04e9 is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
.
.
.
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

this is the error i receive for like 30% of my runners what could cause this? and how can i increase the percentage of successfull instantiations?

jappyjan avatar Dec 17 '21 21:12 jappyjan

Would also like to know if there's something one can do to limit these situations...

Preen avatar May 19 '22 10:05 Preen

I ran into this issue when previous runners didn't clean themselves up in the GitHub API and I look at the cloud-init log of the configure command it was asking if I wanted to replace the previous runner:

https://github.com/actions/runner/blob/main/src/Runner.Listener/CommandSettings.cs#L193

Edit: I followed up here and it seems one can pass in --replace to the config.sh script. I could fork and cut a PR for this, but was wondering if it should be flaggable since it shouldn't normally happen on a clean stop of the instance (which sometimes isn't guaranteed).

Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: This runner will have the following labels: 'self-hosted', 'Linux', 'X64'
Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: Enter any additional labels (ex. label-1,label-2): [press Enter to skip]```

farvour avatar Jun 24 '22 20:06 farvour

Hello!

Any update on this? We are also facing a lot of unstarted runners...

@farvour do you plan to do a PR or is there a way to apply your solution on our side?

Thank you!

pwo3 avatar Sep 26 '22 15:09 pwo3

We're also experiencing this, requiring periodic manual re-run of our CI jobs.

davegravy avatar Oct 18 '22 15:10 davegravy

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

machulav avatar Nov 09 '22 13:11 machulav

A timeout of 5 minutes is exceeded

This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.

If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.

I'll have to do some probing to see what's going on with 443 traffic however some observations which may be relevant are:

  • For successful registrations (under 5 minutes) the distribution of time it takes is pretty wide. Not sure why, but I get a decent number of registrations that occur around the 4-5 minute mark. Others occur almost immediately. This seems independent of how long it takes for the instance to reach an "OK" status reported via aws ec2 describe-instance-status.

  • None of my other EC2 instances instances in this subnet have any obvious networking issues. All my security policies are fully open in the outbound direction.

I'm using a t3.mini instance with an AMI based on bare Ubuntu 20.04, then prepared with the apt equivalent of the README instructions, and nothing else done to it.

davegravy avatar Nov 09 '22 13:11 davegravy

Hi, I think this might be the same issue where the hostname is used as runner name, and hostnames are reused between instances: https://github.com/machulav/ec2-github-runner/issues/128 Maybe using the --replace option is a good idea though!

jeverling avatar Dec 08 '22 14:12 jeverling