terraform-aws-components icon indicating copy to clipboard operation
terraform-aws-components copied to clipboard

Spacelift Worker Pool ASG may fail to scale due to ami/instance type mismatch

Open arcaven opened this issue 2 years ago • 0 comments

Found a bug? Maybe our Slack Community can help.

Slack Community

Describe the Bug

With the recent addition of spacelift worker pool support for arm64, the data source filters that return the ami will sometimes return the arm64 image rather than the x86_64 image. This will result in failures to start new instances in the autoscaling group whenever the arm64 ami is returned first and autoscaling groups will generate errors.

Expected Behavior

Prior to Spacelift's release of the arm64 AMIs, all spacelift worker pool instances launched were x86_64. One expects the same behavior before and after the release of arm64 images. In the future when arm64 support for geodesic and terraform module releases are more widespread, some may chose to switch to arm64, but one never wants to flip flop randomly, as the instance type and the ami must always match. At present the instance type is set statically in yaml, so this can be forced to x86_64.

Steps to Reproduce

Steps to reproduce the behavior:

  1. At the moment this is being written, all ASGs that fire in aws us-east-1 are probably failing due to the arm64 returning first from the filter
  2. Look at the CloudTrail logs when triggering auto-scaling
  3. Watch the worker pool. You may see active and busy held down at the minimum level while the pending remains high for an hour or more

Screenshots

Screen Shot 2023-02-25 at 9 37 26 PM

Environment (please complete the following information):

  • Spacelift on Feb 25th 2023
  • Terraform 1.3.8
  • us-east-1 AMIs

Additional Context

  • https://spacelift.io/changelog/en/arm-private-worker-pools-are-here-2HC4a1tls

arcaven avatar Feb 26 '23 05:02 arcaven