terraform-aws-github-runner
terraform-aws-github-runner copied to clipboard
Runners are not getting created.
Discussed in https://github.com/philips-labs/terraform-aws-github-runner/discussions/3460
Originally posted by pankajpandav September 1, 2023
Hello,
I am trying to use multi-runner mode.
Whenever git action runs, its able to spin up a new ec2 instance but even after long time, runner doesn't get created.
The runner in this case is ubuntu.
When I checked the user-data.log, it keeps on waiting for GH token from Parameter Store.
Here is the log snippet.
An error occurred (ParameterNotFound) when calling the GetParameter operation: Waiting for GH Runner config to become available in AWS SSM
This is coming from start-runner.sh from runners/templates folder. I am not sure what is wrong with my setup. I don't see the parameter store entry it's trying to fetch from parameter store which is the token for registration. I have tried changing the runner mode from persistent to ephemeral and enable_jit_config to true. I see it getting reflected in parameter store but don't see runners using it. What could be wrong here?
Actually, I have same problem, who know how to solve it?
same here
I try to disable the ephemeral_runners, it works.
Doesn't work for me.
Having the same issue :( https://github.com/philips-labs/terraform-aws-github-runner/issues/3527
I found my scale up lambda was outputting the same thing as https://github.com/philips-labs/terraform-aws-github-runner/issues/3332 "Resource not accessible by integration".
Adding enable_organization_runners: true under runner_config: did the trick. Seems like it might not be possible to have repo level runners with this?
We are using org level, ephemeral and jit_config enabled runners. While upgrading from v3.3.0
to v5.0.0
of this module, we needed to destroy everything and re-apply the terraform.
We suspect it is related to SSM; it has its own opinions about outputting this error message An error occurred (ParameterNotFound) when calling the GetParameter
- even if you can see them in the SSM Parameter Store. If you can't/won't destroy everything, manually removing SSM Parameters and reapplying should do the trick as well.
Hope it helps, happy CI runs :)
In my case
/github-action-runners/action-runners/builder/runners/config/runner-group/Default
was not created. In lambda scale-up logs I found this
{
"level": "WARN",
"message": "SSM Parameter \"/github-action-runners/action-runners/builder/runners/config/runner-group/Default\"\n for Runner group Default does not exist",
"service": "runners-scale-up",
"timestamp": "2023-11-27T15:51:52.432Z",
"xray_trace_id": "1-6564bb13-90b3a326e96463b0bb8e885a",
"module": "scale-up",
"region": "eu-west-1",
"environment": "action-runners-builder",
"aws-request-id": "ff47aa00-ac7a-55c4-8bbf-f7de113d0d45",
"function-name": "action-runners-builder-scale-up",
"runner": {
"type": "Org",
"owner": "nahsilabs"
},
"github": {
"event": "workflow_job",
"workflow_job_id": "19061575727"
}
}
{
"level": "WARN",
"message": "Ignoring error: Request failed with status code 404",
"service": "runners-scale-up",
"timestamp": "2023-11-27T15:51:52.652Z",
"xray_trace_id": "1-6564bb13-90b3a326e96463b0bb8e885a",
"region": "eu-west-1",
"environment": "action-runners-builder",
"aws-request-id": "ff47aa00-ac7a-55c4-8bbf-f7de113d0d45",
"function-name": "action-runners-builder-scale-up",
"module": "lambda.ts"
}
I created it myself with value 1 and it started to work.
I had the same issue as @nahsi did with the missing SSM parameter. Note that the second component in that path is your environment name so if you create it manually it may not be action-runners
.
I think I got closer to the root cause. The SSM parameter /github-action-runners/{name}/runners/config/runner-group/Default
is missing indeed. If that's the case, it should be created by the scale-up lambda by querying a GitHub API endpoint for the runner ID:
https://github.com/philips-labs/terraform-aws-github-runner/blob/6fa667fae7e4302cf643bcdb4ff3c91b1e4ed8d1/lambdas/functions/control-plane/src/scale-runners/scale-up.ts#L156-L175
That doesn't work, as this X-Ray trace shows:
https://github.com/philips-labs/terraform-aws-github-runner/blob/6fa667fae7e4302cf643bcdb4ff3c91b1e4ed8d1/lambdas/functions/control-plane/src/scale-runners/scale-up.ts#L187-L199
There are two problems with GetRunnerGroupByName()
:
- The GitHub API endpoint changed: it's now called
/orgs/{org}/actions/runners
(docs) (instead of/orgs/{org}/actions/runner-groups
, which is still used by the GHES API, it seems) - The currently used API endpoint
/orgs/{org}/actions/runner-groups
requiredorg:admin
permissions, which are not required any more for a long time now (https://github.com/philips-labs/terraform-aws-github-runner/commit/7572405f65ed1e7016f708eb7e6f323ec5270b5a).
I assume changing the API endpoint in scale-up.ts
would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.
@npalm since you seem to have last modified the code in question: can you help me out with the question above: "I assume changing the API endpoint in scale-up.ts would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?" I'm happy to do a PR, but I would need some additional guidance first.
Hi everyone!
I'm not sure that I've faced up with the same problem, but my runners are also not getting created. Here some logs from "scale-up" lambda:
{
"level": "INFO",
"message": "Received event",
"service": "runners-scale-up",
"timestamp": "2024-02-22T12:23:38.140Z",
"xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
"module": "scale-up",
"region": "us-east-1",
"environment": "orx-web",
"aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
"function-name": "orx-web-scale-up",
"runner": {
"type": "Repo",
"owner": "OR-Trax/orx-dbp-web",
"namePrefix": ""
},
"github": {
"event": "workflow_job",
"workflow_job_id": "21861689216"
}
}
{
"level": "DEBUG",
"message": "GHES API URL: ",
"service": "runners-scale-up",
"timestamp": "2024-02-22T12:23:38.505Z",
"xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
"module": "gh-auth",
"region": "us-east-1",
"environment": "orx-web",
"aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
"function-name": "orx-web-scale-up",
"runner": {
"type": "Repo",
"owner": "OR-Trax/orx-dbp-web",
"namePrefix": ""
},
"github": {
"event": "workflow_job",
"workflow_job_id": "21861689216"
}
}
{
"level": "WARN",
"message": "Ignoring error: Request failed with status code 404",
"service": "runners-scale-up",
"timestamp": "2024-02-22T12:23:38.702Z",
"xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
"region": "us-east-1",
"environment": "orx-web",
"aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
"function-name": "orx-web-scale-up",
"module": "lambda.ts"
}
END RequestId: a003240c-9ea6-58aa-a57f-7a5da1a5bb40
I see no any messages about missing SSM parameter. Tried to change log level to the most verbose and still nothing.
@npalm since you seem to have last modified the code in question: can you help me out with the question above: "I assume changing the API endpoint in scale-up.ts would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?" I'm happy to do a PR, but I would need some additional guidance first.
still having problems, missed this issue.
Hi everyone!
I'm not sure that I've faced up with the same problem, but my runners are also not getting created. Here some logs from "scale-up" lambda:
{ "level": "INFO", "message": "Received event", "service": "runners-scale-up", "timestamp": "2024-02-22T12:23:38.140Z", "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d", "module": "scale-up", "region": "us-east-1", "environment": "orx-web", "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40", "function-name": "orx-web-scale-up", "runner": { "type": "Repo", "owner": "OR-Trax/orx-dbp-web", "namePrefix": "" }, "github": { "event": "workflow_job", "workflow_job_id": "21861689216" } } { "level": "DEBUG", "message": "GHES API URL: ", "service": "runners-scale-up", "timestamp": "2024-02-22T12:23:38.505Z", "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d", "module": "gh-auth", "region": "us-east-1", "environment": "orx-web", "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40", "function-name": "orx-web-scale-up", "runner": { "type": "Repo", "owner": "OR-Trax/orx-dbp-web", "namePrefix": "" }, "github": { "event": "workflow_job", "workflow_job_id": "21861689216" } } { "level": "WARN", "message": "Ignoring error: Request failed with status code 404", "service": "runners-scale-up", "timestamp": "2024-02-22T12:23:38.702Z", "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d", "region": "us-east-1", "environment": "orx-web", "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40", "function-name": "orx-web-scale-up", "module": "lambda.ts" } END RequestId: a003240c-9ea6-58aa-a57f-7a5da1a5bb40
I see no any messages about missing SSM parameter. Tried to change log level to the most verbose and still nothing.
facing the same issue with org level ephemeral runners in Github enterprise
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.
In my case, I created a runner group on github that corresponded to the "runner_group_name" input, and the runners were created in that group, which solved the problem.
@here still getting this error any suggestion