terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard
ERROR: Preparation failed: exit status 2
The following error occurs rather regularly during large scale up events. When something like 30-80 new jobs appear beyond the current scale of runners many of the jobs will fail with this error. Retrying the jobs succeeds, but it seems like something is not waiting long enough for the new machines to start up.
I am unclear on whether this is the primary runner's fault, docker machine, or EC2. I have looked around at Gitlab runner code and looked for a timeout to increase, but haven't come across a useful one.
If this is out of the project scope that's fine, but if you have any insight into how to avoid this that would be appreciated.
Running with gitlab-runner 14.0.1 (c1edb478)
on docker-default sgLT1ihz
section_start:1645715193:resolve_secrets
Resolving secrets
section_end:1645715193:resolve_secrets
section_start:1645715193:prepare_executor
Preparing the "docker+machine" executor
ERROR: Preparation failed: exit status 2
Will be retried in 3s ...
ERROR: Preparation failed: can't connect
Will be retried in 3s ...
ERROR: Preparation failed: exit status 2
Will be retried in 3s ...
section_end:1645715246:prepare_executor
ERROR: Job failed (system failure): exit status 2
Tracked this down to primary instance t3.micro exhausting memory when scaling up rapidly. Upgrading to a t3.small resolved. This might be worth documenting along the lines of depending on your job load a larger primary instance may be needed. If you see, Preparation failed ...
We are seeing a similar intermittent error before and after migrating from a t3.micro to t3.small
Running with gitlab-runner 14.10.0 (c6bb62f6)
on default-auto DyJk7Eeo
Resolving secrets 00:00
Preparing the "docker+machine" executor 00:49
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1
To make sure I'm clear, I am talking about the primary instance instance_type not docker_machine_instance_type. Also that is exist status 1, which if I recall is a slightly different issue.
If you can pull up the resource status (like memory) on the primary instance and the logs for it.
Seem to have plenty of memory

I work with @tourdownunder , the problem ended up being due to AWS Spot Limits
@jamesmstone can we close this issue?
@npalm Nope, I'm facing the same issue. My runners are working well with the m4.large instance type, but when I'm increasing the AWS EC2 instance type to c7g.2xlarge for example I'm facing the same issue. ERROR: Preparation failed: exit status 1 Will be retried in 3s ... ERROR: Preparation failed: exit status 1 Will be retried in 3s ... ERROR: Preparation failed: exit status 1 Will be retried in 3s ... ERROR: Job failed (system failure): exit status 1
And I'm not using AWS Spot instances
But what size is the primary, not the workers? That is where the issue originates.
docker_machine_instance_type = "c7g.2xlarge" instance_type = "t3.medium"
This is the config that I'm using for Runner itself and for On demand runners
Is there anything that I'm missing?)
I missed exit status 1 not exit status 2 which are different reasons if I recall.
I work with @tourdownunder , the problem ended up being due to AWS Spot Limits
Can you remember what the issue was and how you solved it? I'm having the same problem.
We ended up reducing the rate we started new instances at to align with AWS default limits
From: mark-webster-catalyst @.> Sent: Thursday, December 15, 2022 1:32:14 AM To: npalm/terraform-aws-gitlab-runner @.> Cc: James Stone @.>; Mention @.> Subject: Re: [npalm/terraform-aws-gitlab-runner] ERROR: Preparation failed: exit status 2 (Issue #447)
I work with @tourdownunderhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftourdownunder&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386143249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=th94k4b00WsQr9yX2nYIXi5nlAsEeQSnnHCNNV%2BOBBY%3D&reserved=0 , the problem ended up being due to AWS Spot Limits
Can you remember what the issue was and how you solved it? I'm having the same problem.
— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnpalm%2Fterraform-aws-gitlab-runner%2Fissues%2F447%23issuecomment-1351504195&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386143249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rF2slYYc0XIQlSccMOMRo%2BfesFSRzs81j2orWvrG1KE%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABAC7JLW7OLNUAFVTHEEBKTWNHK65ANCNFSM5PHYV23Q&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386299468%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Q0UpeUVLBR2DlOBpEjX8nHm4d1a7hX%2BPTiuLCuHwVCI%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
We ended up reducing the rate we started new instances at to align with AWS default limits … ________________________________ From: mark-webster-catalyst @.> Sent: Thursday, December 15, 2022 1:32:14 AM To: npalm/terraform-aws-gitlab-runner @.> Cc: James Stone @.>; Mention @.> Subject: Re: [npalm/terraform-aws-gitlab-runner] ERROR: Preparation failed: exit status 2 (Issue #447) I work with @tourdownunderhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftourdownunder&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386143249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=th94k4b00WsQr9yX2nYIXi5nlAsEeQSnnHCNNV%2BOBBY%3D&reserved=0 , the problem ended up being due to AWS Spot Limits Can you remember what the issue was and how you solved it? I'm having the same problem. — Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnpalm%2Fterraform-aws-gitlab-runner%2Fissues%2F447%23issuecomment-1351504195&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386143249%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rF2slYYc0XIQlSccMOMRo%2BfesFSRzs81j2orWvrG1KE%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABAC7JLW7OLNUAFVTHEEBKTWNHK65ANCNFSM5PHYV23Q&data=05%7C01%7C%7C6723c507d278409301fe08dadddfff66%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638066251386299468%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Q0UpeUVLBR2DlOBpEjX8nHm4d1a7hX%2BPTiuLCuHwVCI%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
@jamesmstone This is very helpful, a couple questions:
- Did this error start for you recently? It seems like there's more activity on this issue at a time when we are also experience this error out of no where. Coincidence? a. If "yes" to the above, do you have a sense of what underlying dependency changed to cause this problem?
- How did you limit the rate at which new instances are set? Can you give an example of your settings?
Thanks in advance!
To clarify, I was also experiencing the exit status 1 error, not exit status 2.
For anyone who finds this, in my case it was invalid characters in the overrides block that caused it, after upgrading the module to 5.5.0. I had this block:
overrides = {
name_docker_machine_runners = "gitlab_runner_spot_instance" # Underscores here caused the issue
}
And I found this in the Cloud Watch Log Group, after attempting to run a pipeline on the runner:
Dec 15 00:45:04 ip-172-31-5-167 gitlab-runner:
{
"driver": "amazonec2",
"level": "error",
"msg": "Error creating machine: Invalid hostname specified. Allowed hostname chars are: 0-9a-zA-Z . -",
"name": "runner-u-ca1k6x-gitlab_runner_spot_instance-1671065104-4e0d8727",
"operation": "create",
"time": "2022-12-15T00:45:04Z"
}
Removing the invalid characters allowed the instances to be created without errors.
We ended up reducing the rate we started new instances at to align with AWS default limits
@tnightengale How did you do that?
@kayman-mk . We reduced the number of runners_concurrent (or was it runners_limit) from 20 to 10. Though we could have (and may in the future) Request a Spot Instance limit increase as per Spot Instance limits so we can use more.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.