terraform-aws-github-runner icon indicating copy to clipboard operation
terraform-aws-github-runner copied to clipboard

Runners are not getting created.

Open pankajpandav opened this issue 1 year ago • 12 comments

Discussed in https://github.com/philips-labs/terraform-aws-github-runner/discussions/3460

Originally posted by pankajpandav September 1, 2023 Hello, I am trying to use multi-runner mode. Whenever git action runs, its able to spin up a new ec2 instance but even after long time, runner doesn't get created. The runner in this case is ubuntu. When I checked the user-data.log, it keeps on waiting for GH token from Parameter Store. Here is the log snippet. An error occurred (ParameterNotFound) when calling the GetParameter operation: Waiting for GH Runner config to become available in AWS SSM

This is coming from start-runner.sh from runners/templates folder. I am not sure what is wrong with my setup. I don't see the parameter store entry it's trying to fetch from parameter store which is the token for registration. I have tried changing the runner mode from persistent to ephemeral and enable_jit_config to true. I see it getting reflected in parameter store but don't see runners using it. What could be wrong here?

pankajpandav avatar Sep 01 '23 01:09 pankajpandav

Actually, I have same problem, who know how to solve it?

carloscao0928 avatar Sep 12 '23 08:09 carloscao0928

same here

userkkw avatar Sep 12 '23 08:09 userkkw

I try to disable the ephemeral_runners, it works.

carloscao0928 avatar Sep 13 '23 05:09 carloscao0928

Doesn't work for me.

pankajpandav avatar Sep 13 '23 11:09 pankajpandav

Having the same issue :( https://github.com/philips-labs/terraform-aws-github-runner/issues/3527

npwolf avatar Oct 06 '23 18:10 npwolf

I found my scale up lambda was outputting the same thing as https://github.com/philips-labs/terraform-aws-github-runner/issues/3332 "Resource not accessible by integration".

Adding enable_organization_runners: true under runner_config: did the trick. Seems like it might not be possible to have repo level runners with this?

npwolf avatar Oct 06 '23 23:10 npwolf

We are using org level, ephemeral and jit_config enabled runners. While upgrading from v3.3.0 to v5.0.0 of this module, we needed to destroy everything and re-apply the terraform.

We suspect it is related to SSM; it has its own opinions about outputting this error message An error occurred (ParameterNotFound) when calling the GetParameter - even if you can see them in the SSM Parameter Store. If you can't/won't destroy everything, manually removing SSM Parameters and reapplying should do the trick as well.

Hope it helps, happy CI runs :)

esaday avatar Oct 31 '23 11:10 esaday

In my case

/github-action-runners/action-runners/builder/runners/config/runner-group/Default

was not created. In lambda scale-up logs I found this

{
    "level": "WARN",
    "message": "SSM Parameter \"/github-action-runners/action-runners/builder/runners/config/runner-group/Default\"\n         for Runner group Default does not exist",
    "service": "runners-scale-up",
    "timestamp": "2023-11-27T15:51:52.432Z",
    "xray_trace_id": "1-6564bb13-90b3a326e96463b0bb8e885a",
    "module": "scale-up",
    "region": "eu-west-1",
    "environment": "action-runners-builder",
    "aws-request-id": "ff47aa00-ac7a-55c4-8bbf-f7de113d0d45",
    "function-name": "action-runners-builder-scale-up",
    "runner": {
        "type": "Org",
        "owner": "nahsilabs"
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "19061575727"
    }
}
{
    "level": "WARN",
    "message": "Ignoring error: Request failed with status code 404",
    "service": "runners-scale-up",
    "timestamp": "2023-11-27T15:51:52.652Z",
    "xray_trace_id": "1-6564bb13-90b3a326e96463b0bb8e885a",
    "region": "eu-west-1",
    "environment": "action-runners-builder",
    "aws-request-id": "ff47aa00-ac7a-55c4-8bbf-f7de113d0d45",
    "function-name": "action-runners-builder-scale-up",
    "module": "lambda.ts"
}

I created it myself with value 1 and it started to work.

nahsi avatar Nov 27 '23 16:11 nahsi

I had the same issue as @nahsi did with the missing SSM parameter. Note that the second component in that path is your environment name so if you create it manually it may not be action-runners.

bromanko avatar Dec 20 '23 23:12 bromanko

I think I got closer to the root cause. The SSM parameter /github-action-runners/{name}/runners/config/runner-group/Default is missing indeed. If that's the case, it should be created by the scale-up lambda by querying a GitHub API endpoint for the runner ID:

https://github.com/philips-labs/terraform-aws-github-runner/blob/6fa667fae7e4302cf643bcdb4ff3c91b1e4ed8d1/lambdas/functions/control-plane/src/scale-runners/scale-up.ts#L156-L175

That doesn't work, as this X-Ray trace shows:

image

https://github.com/philips-labs/terraform-aws-github-runner/blob/6fa667fae7e4302cf643bcdb4ff3c91b1e4ed8d1/lambdas/functions/control-plane/src/scale-runners/scale-up.ts#L187-L199

There are two problems with GetRunnerGroupByName():

  1. The GitHub API endpoint changed: it's now called /orgs/{org}/actions/runners (docs) (instead of /orgs/{org}/actions/runner-groups, which is still used by the GHES API, it seems)
  2. The currently used API endpoint /orgs/{org}/actions/runner-groups required org:admin permissions, which are not required any more for a long time now (https://github.com/philips-labs/terraform-aws-github-runner/commit/7572405f65ed1e7016f708eb7e6f323ec5270b5a).

I assume changing the API endpoint in scale-up.ts would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?

imphil avatar Jan 04 '24 20:01 imphil

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 04 '24 01:02 github-actions[bot]

@npalm since you seem to have last modified the code in question: can you help me out with the question above: "I assume changing the API endpoint in scale-up.ts would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?" I'm happy to do a PR, but I would need some additional guidance first.

imphil avatar Feb 07 '24 06:02 imphil

Hi everyone!

I'm not sure that I've faced up with the same problem, but my runners are also not getting created. Here some logs from "scale-up" lambda:

{
    "level": "INFO",
    "message": "Received event",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.140Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "module": "scale-up",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "runner": {
        "type": "Repo",
        "owner": "OR-Trax/orx-dbp-web",
        "namePrefix": ""
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "21861689216"
    }
}


{
    "level": "DEBUG",
    "message": "GHES API URL: ",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.505Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "module": "gh-auth",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "runner": {
        "type": "Repo",
        "owner": "OR-Trax/orx-dbp-web",
        "namePrefix": ""
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "21861689216"
    }
}


{
    "level": "WARN",
    "message": "Ignoring error: Request failed with status code 404",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.702Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "module": "lambda.ts"
}


END RequestId: a003240c-9ea6-58aa-a57f-7a5da1a5bb40

I see no any messages about missing SSM parameter. Tried to change log level to the most verbose and still nothing.

alex-astafyev avatar Feb 22 '24 12:02 alex-astafyev

@npalm since you seem to have last modified the code in question: can you help me out with the question above: "I assume changing the API endpoint in scale-up.ts would solve the problem at least for users of github.org, but maybe not for GHES users? And how did that ever work?" I'm happy to do a PR, but I would need some additional guidance first.

still having problems, missed this issue.

npalm avatar Feb 22 '24 22:02 npalm

Hi everyone!

I'm not sure that I've faced up with the same problem, but my runners are also not getting created. Here some logs from "scale-up" lambda:

{
    "level": "INFO",
    "message": "Received event",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.140Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "module": "scale-up",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "runner": {
        "type": "Repo",
        "owner": "OR-Trax/orx-dbp-web",
        "namePrefix": ""
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "21861689216"
    }
}


{
    "level": "DEBUG",
    "message": "GHES API URL: ",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.505Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "module": "gh-auth",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "runner": {
        "type": "Repo",
        "owner": "OR-Trax/orx-dbp-web",
        "namePrefix": ""
    },
    "github": {
        "event": "workflow_job",
        "workflow_job_id": "21861689216"
    }
}


{
    "level": "WARN",
    "message": "Ignoring error: Request failed with status code 404",
    "service": "runners-scale-up",
    "timestamp": "2024-02-22T12:23:38.702Z",
    "xray_trace_id": "1-65d73cc8-39e86de7046858a2b028c56d",
    "region": "us-east-1",
    "environment": "orx-web",
    "aws-request-id": "a003240c-9ea6-58aa-a57f-7a5da1a5bb40",
    "function-name": "orx-web-scale-up",
    "module": "lambda.ts"
}


END RequestId: a003240c-9ea6-58aa-a57f-7a5da1a5bb40

I see no any messages about missing SSM parameter. Tried to change log level to the most verbose and still nothing.

facing the same issue with org level ephemeral runners in Github enterprise

shubhamsinha-sf avatar Feb 27 '24 13:02 shubhamsinha-sf

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Mar 29 '24 01:03 github-actions[bot]

In my case, I created a runner group on github that corresponded to the "runner_group_name" input, and the runners were created in that group, which solved the problem.

Seochokid avatar Apr 17 '24 09:04 Seochokid

@here still getting this error any suggestion

supriyaaaic avatar Jun 15 '24 16:06 supriyaaaic