ec2-github-runner
ec2-github-runner copied to clipboard
30% of runner instantiation failes due to timeout
Run machulav/ec2-github-runner@v2
GitHub Registration Token is received
AWS EC2 instance i-0eeae9ef28dcd04e9 is started
AWS EC2 instance i-0eeae9ef28dcd04e9 is up and running
Waiting 30s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
.
.
.
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.
this is the error i receive for like 30% of my runners what could cause this? and how can i increase the percentage of successfull instantiations?
Would also like to know if there's something one can do to limit these situations...
I ran into this issue when previous runners didn't clean themselves up in the GitHub API and I look at the cloud-init log of the configure command it was asking if I wanted to replace the previous runner:
https://github.com/actions/runner/blob/main/src/Runner.Listener/CommandSettings.cs#L193
Edit: I followed up here and it seems one can pass in --replace to the config.sh script. I could fork and cut a PR for this, but was wondering if it should be flaggable since it shouldn't normally happen on a clean stop of the instance (which sometimes isn't guaranteed).
Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: This runner will have the following labels: 'self-hosted', 'Linux', 'X64'
Jun 24 20:41:24 ip-10-x-0-61 cloud-init[2458]: Enter any additional labels (ex. label-1,label-2): [press Enter to skip]```
Hello!
Any update on this? We are also facing a lot of unstarted runners...
@farvour do you plan to do a PR or is there a way to apply your solution on our side?
Thank you!
We're also experiencing this, requiring periodic manual re-run of our CI jobs.
A timeout of 5 minutes is exceeded
This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.
If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.
A timeout of 5 minutes is exceeded
This error usually means that your new EC2 runner can not communicate to GitHub and registers itself as a new runner. Based on the tests, 5 minutes is more than enough for the EC2 runner to be able to register itself. So I recommend double-checking if the outbound traffic on port 443 is always opened for your EC2 runner.
If that does not help, I have to ask you to provide more information about your action configuration and the AWS infrastructure setup to help you with the issue.
I'll have to do some probing to see what's going on with 443 traffic however some observations which may be relevant are:
-
For successful registrations (under 5 minutes) the distribution of time it takes is pretty wide. Not sure why, but I get a decent number of registrations that occur around the 4-5 minute mark. Others occur almost immediately. This seems independent of how long it takes for the instance to reach an "OK" status reported via
aws ec2 describe-instance-status. -
None of my other EC2 instances instances in this subnet have any obvious networking issues. All my security policies are fully open in the outbound direction.
I'm using a t3.mini instance with an AMI based on bare Ubuntu 20.04, then prepared with the apt equivalent of the README instructions, and nothing else done to it.
Hi, I think this might be the same issue where the hostname is used as runner name, and hostnames are reused between instances: https://github.com/machulav/ec2-github-runner/issues/128
Maybe using the --replace option is a good idea though!