terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard

capacity-not-available when using spot instances

Open schealex opened this issue 2 years ago • 10 comments

Hi everyone,

I'm not really sure if that is an AWS issue or maybe a config issue on our side but right now we are having huge troubles getting runners up for our jobs. It worked fine for a couple month but since a few days we have hours of downtime due to not getting spot instances as requested by the runner:

Is this something we can address on AWS? Is there something we can do to prevent this?

If there is not is there a way to do a mixed configuration where you have at least 1 on-demand instance and up to X spot instances - so that we can make sure that there is always at least one instance to handle jobs?

Regards Alex

Apr 28 '22 13:04 schealex

I deployed the module two times: One is using spot instances and the other is running on on-demand instances (for Terraform jobs). Just copy your module definition and change the two or three variables.

Apr 28 '22 20:04 kayman-mk

There's #249 on this topic.

@kayman-mk about to do the same, I'm wondering about the behavior : if I've the spot one with autoscaling idle count > 1 and the on-demand with autoscaling idle count = 0, will it favor spot ones until eventually there can't be any spot, then it spins up on-demand ?

May 05 '22 09:05 rgarrigue

I don't think so. I guess it's the runner contacting your GitLab instance next which receives the job. But you can't control that.

Would be a nice feature to use spot instances first and on-demand instances if no spot instances are available. But this didn't happen here.

May 06 '22 21:05 kayman-mk

Hi, @kayman-mk Could you please guide me on how to disable spot instances?

I've unset the spot bid price variable and also set in docker_machine_options not to request spot instances, but it doesn't work and keeps spawning spot instances

module "gitlab-runner" {
  source  = "npalm/gitlab-runner/aws"
  version = "4.41.1"

  asg_terminate_lifecycle_hook_create = false
  aws_region                          = "us-east-1"
  docker_machine_options              = ["amazonec2-instance-type=${var.runner_instance_type}", "amazonec2-request-spot-instance=false", "amazonec2-use-ebs-optimized-instance=true", "amazonec2-root-size=${var.runner_root_disk_size}", "amazonec2-ami=ami-0c4f7023847b90238"]
  docker_machine_version              = "0.16.2-gitlab.12"
  enable_asg_recreation               = true
  enable_cloudwatch_logging           = false
  enable_docker_machine_ssm_access    = true
  enable_eip                          = true
  enable_manage_gitlab_token          = true
  enable_runner_ssm_access            = true
  environment                         = "gitlab-runner-manager"
  gitlab_runner_version               = "14.8.2"
  instance_type                       = var.instance_type
  runners_add_dind_volumes            = false
  runners_additional_volumes          = ["/builds:/builds:rw", "/certs/client", "/var/run/docker.sock:/var/run/docker.sock"]
  runners_gitlab_url                  = "https://gitlab.com/"
  runners_idle_count                  = 5
  runners_request_concurrency         = 5
  runners_idle_time                   = 3600
  runners_concurrent                  = 50
  runners_name                        = "docker-default"
  runners_privileged                  = true
  runners_pull_policy                 = "if-not-present"
  runners_token                       = "xxxxxx"
  runners_use_private_address         = false
  subnet_id                           = var.subnet_id
  vpc_id                              = var.vpc_id

May 09 '22 17:05 erickfaustino

Have you tried to set runners_request_spot_instance = false?

May 10 '22 06:05 kayman-mk

In terms of on-demand fallback if there is no spot instance. Apparently docker-machine doesn't support this and since it's archived and deprecated, won't ever.

https://github.com/docker/machine/issues/4588

May 16 '22 14:05 AlexEndris

The module is using by default the GitLab maintained docker-machine.

May 20 '22 15:05 npalm

I am also encountering this issue. Running a few on-demand instances doesn't solve the issue, only mitigates it. Even for a medium company sometimes there are more than a dozen pipelines run simultaneously. So I going to introduce a different type of on-spot instance under the same GitLab tag as a backup.

Still, it would be much better to be able to request on-demand instances in such cases with a limited lifespan.

Jun 03 '22 12:06 DmRomantsov

Indeed, would be nice, but out of scope here, right?

I am using t3.micro, m5.large, c5.large and c5.xlarge spot instances in eu-central-1. Theses instances are always available.

Jun 19 '22 11:06 kayman-mk

Using m5.large and c5.large without any issue. By default AWS sets your spot limit low, you can request an increase via the support center

Jul 20 '22 21:07 npalm

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.

Jan 01 '23 03:01 github-actions[bot]

There can't be a solution within this module. Consider to

use other EC2 machine types
increase the AWS limits for number of spot instances.

Jan 01 '23 15:01 kayman-mk

terraform-aws-gitlab-runner terraform-aws-gitlab-runner copied to clipboard

capacity-not-available when using spot instances

terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard