terraform-aws-gitlab-runner icon indicating copy to clipboard operation
terraform-aws-gitlab-runner copied to clipboard

capacity-not-available when using spot instances

Open schealex opened this issue 2 years ago • 10 comments

Hi everyone,

I'm not really sure if that is an AWS issue or maybe a config issue on our side but right now we are having huge troubles getting runners up for our jobs. It worked fine for a couple month but since a few days we have hours of downtime due to not getting spot instances as requested by the runner:

image

Is this something we can address on AWS? Is there something we can do to prevent this?

If there is not is there a way to do a mixed configuration where you have at least 1 on-demand instance and up to X spot instances - so that we can make sure that there is always at least one instance to handle jobs?

Regards Alex

schealex avatar Apr 28 '22 13:04 schealex

I deployed the module two times: One is using spot instances and the other is running on on-demand instances (for Terraform jobs). Just copy your module definition and change the two or three variables.

kayman-mk avatar Apr 28 '22 20:04 kayman-mk

There's #249 on this topic.

@kayman-mk about to do the same, I'm wondering about the behavior : if I've the spot one with autoscaling idle count > 1 and the on-demand with autoscaling idle count = 0, will it favor spot ones until eventually there can't be any spot, then it spins up on-demand ?

rgarrigue avatar May 05 '22 09:05 rgarrigue

I don't think so. I guess it's the runner contacting your GitLab instance next which receives the job. But you can't control that.

Would be a nice feature to use spot instances first and on-demand instances if no spot instances are available. But this didn't happen here.

kayman-mk avatar May 06 '22 21:05 kayman-mk

Hi, @kayman-mk Could you please guide me on how to disable spot instances?

I've unset the spot bid price variable and also set in docker_machine_options not to request spot instances, but it doesn't work and keeps spawning spot instances

module "gitlab-runner" {
  source  = "npalm/gitlab-runner/aws"
  version = "4.41.1"

  asg_terminate_lifecycle_hook_create = false
  aws_region                          = "us-east-1"
  docker_machine_options              = ["amazonec2-instance-type=${var.runner_instance_type}", "amazonec2-request-spot-instance=false", "amazonec2-use-ebs-optimized-instance=true", "amazonec2-root-size=${var.runner_root_disk_size}", "amazonec2-ami=ami-0c4f7023847b90238"]
  docker_machine_version              = "0.16.2-gitlab.12"
  enable_asg_recreation               = true
  enable_cloudwatch_logging           = false
  enable_docker_machine_ssm_access    = true
  enable_eip                          = true
  enable_manage_gitlab_token          = true
  enable_runner_ssm_access            = true
  environment                         = "gitlab-runner-manager"
  gitlab_runner_version               = "14.8.2"
  instance_type                       = var.instance_type
  runners_add_dind_volumes            = false
  runners_additional_volumes          = ["/builds:/builds:rw", "/certs/client", "/var/run/docker.sock:/var/run/docker.sock"]
  runners_gitlab_url                  = "https://gitlab.com/"
  runners_idle_count                  = 5
  runners_request_concurrency         = 5
  runners_idle_time                   = 3600
  runners_concurrent                  = 50
  runners_name                        = "docker-default"
  runners_privileged                  = true
  runners_pull_policy                 = "if-not-present"
  runners_token                       = "xxxxxx"
  runners_use_private_address         = false
  subnet_id                           = var.subnet_id
  vpc_id                              = var.vpc_id

erickfaustino avatar May 09 '22 17:05 erickfaustino

Have you tried to set runners_request_spot_instance = false?

kayman-mk avatar May 10 '22 06:05 kayman-mk

In terms of on-demand fallback if there is no spot instance. Apparently docker-machine doesn't support this and since it's archived and deprecated, won't ever.

https://github.com/docker/machine/issues/4588

AlexEndris avatar May 16 '22 14:05 AlexEndris

The module is using by default the GitLab maintained docker-machine.

npalm avatar May 20 '22 15:05 npalm

I am also encountering this issue. Running a few on-demand instances doesn't solve the issue, only mitigates it. Even for a medium company sometimes there are more than a dozen pipelines run simultaneously. So I going to introduce a different type of on-spot instance under the same GitLab tag as a backup.

Still, it would be much better to be able to request on-demand instances in such cases with a limited lifespan.

DmRomantsov avatar Jun 03 '22 12:06 DmRomantsov

Indeed, would be nice, but out of scope here, right?

I am using t3.micro, m5.large, c5.large and c5.xlarge spot instances in eu-central-1. Theses instances are always available.

kayman-mk avatar Jun 19 '22 11:06 kayman-mk

Using m5.large and c5.large without any issue. By default AWS sets your spot limit low, you can request an increase via the support center

npalm avatar Jul 20 '22 21:07 npalm

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Jan 01 '23 03:01 github-actions[bot]

There can't be a solution within this module. Consider to

  • use other EC2 machine types
  • increase the AWS limits for number of spot instances.

kayman-mk avatar Jan 01 '23 15:01 kayman-mk