terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard
capacity-not-available when using spot instances
Hi everyone,
I'm not really sure if that is an AWS issue or maybe a config issue on our side but right now we are having huge troubles getting runners up for our jobs. It worked fine for a couple month but since a few days we have hours of downtime due to not getting spot instances as requested by the runner:
Is this something we can address on AWS? Is there something we can do to prevent this?
If there is not is there a way to do a mixed configuration where you have at least 1 on-demand instance and up to X spot instances - so that we can make sure that there is always at least one instance to handle jobs?
Regards Alex
I deployed the module two times: One is using spot instances and the other is running on on-demand instances (for Terraform jobs). Just copy your module definition and change the two or three variables.
There's #249 on this topic.
@kayman-mk about to do the same, I'm wondering about the behavior : if I've the spot one with autoscaling idle count > 1 and the on-demand with autoscaling idle count = 0, will it favor spot ones until eventually there can't be any spot, then it spins up on-demand ?
I don't think so. I guess it's the runner contacting your GitLab instance next which receives the job. But you can't control that.
Would be a nice feature to use spot instances first and on-demand instances if no spot instances are available. But this didn't happen here.
Hi, @kayman-mk Could you please guide me on how to disable spot instances?
I've unset the spot bid price variable and also set in docker_machine_options
not to request spot instances, but it doesn't work and keeps spawning spot instances
module "gitlab-runner" {
source = "npalm/gitlab-runner/aws"
version = "4.41.1"
asg_terminate_lifecycle_hook_create = false
aws_region = "us-east-1"
docker_machine_options = ["amazonec2-instance-type=${var.runner_instance_type}", "amazonec2-request-spot-instance=false", "amazonec2-use-ebs-optimized-instance=true", "amazonec2-root-size=${var.runner_root_disk_size}", "amazonec2-ami=ami-0c4f7023847b90238"]
docker_machine_version = "0.16.2-gitlab.12"
enable_asg_recreation = true
enable_cloudwatch_logging = false
enable_docker_machine_ssm_access = true
enable_eip = true
enable_manage_gitlab_token = true
enable_runner_ssm_access = true
environment = "gitlab-runner-manager"
gitlab_runner_version = "14.8.2"
instance_type = var.instance_type
runners_add_dind_volumes = false
runners_additional_volumes = ["/builds:/builds:rw", "/certs/client", "/var/run/docker.sock:/var/run/docker.sock"]
runners_gitlab_url = "https://gitlab.com/"
runners_idle_count = 5
runners_request_concurrency = 5
runners_idle_time = 3600
runners_concurrent = 50
runners_name = "docker-default"
runners_privileged = true
runners_pull_policy = "if-not-present"
runners_token = "xxxxxx"
runners_use_private_address = false
subnet_id = var.subnet_id
vpc_id = var.vpc_id
Have you tried to set runners_request_spot_instance = false
?
In terms of on-demand fallback if there is no spot instance. Apparently docker-machine doesn't support this and since it's archived and deprecated, won't ever.
https://github.com/docker/machine/issues/4588
The module is using by default the GitLab maintained docker-machine.
I am also encountering this issue. Running a few on-demand instances doesn't solve the issue, only mitigates it. Even for a medium company sometimes there are more than a dozen pipelines run simultaneously. So I going to introduce a different type of on-spot instance under the same GitLab tag as a backup.
Still, it would be much better to be able to request on-demand instances in such cases with a limited lifespan.
Indeed, would be nice, but out of scope here, right?
I am using t3.micro, m5.large, c5.large and c5.xlarge spot instances in eu-central-1. Theses instances are always available.
Using m5.large and c5.large without any issue. By default AWS sets your spot limit low, you can request an increase via the support center
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
There can't be a solution within this module. Consider to
- use other EC2 machine types
- increase the AWS limits for number of spot instances.