terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Add support for `ignore_failed_scaling_activities`

Open ivankatliarchuk opened this issue 1 year ago β€’ 0 comments

Is your request related to a new offering from AWS?

Is this functionality available in the AWS provider for Terraform? See CHANGELOG.md, too.

  • Yes βœ…: please list the AWS provider version which introduced this functionality

5.12.0

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#ignore_failed_scaling_activities

β•·
β”‚ Error: waiting for Auto Scaling Group (eks-apm-enabled-spot-worker-ns-group1-1.27-2adfasdfasdfasdfa) capacity satisfied: timeout while waiting for state to become 'ok' (last state: 'want exactly 44 healthy instance(s) in Auto Scaling Group, have 45', timeout: 10m0s)
β”‚ 
β”‚   with module.eks_self_managed_node_group["apm-enabled-spot-worker-ns-group1-1.27"].aws_autoscaling_group.this[0],
β”‚   on .terraform/modules/eks_self_managed_node_group/modules/self-managed-node-group/main.tf line 491, in resource "aws_autoscaling_group" "this":
β”‚  491: resource "aws_autoscaling_group" "this" {
β”‚ 
β•΅

Is your request related to a problem? Please describe.

We have multiple clusters. Size of each ASG is ~200 Nodes. Our workflow is as follow

We manage our infrastructure with Terraform and have multiple clusters, each containing number of Auto Scaling Group (ASG) with roughly 200 nodes each. Our workflow involves a two-step process: plan followed by apply. However, when we attempt to upgrade a cluster and modify the ASGs within this workflow, we frequently encounter an issue where the desired size of the ASG changes outside of Terraform's control. This leads to unexpected behavior and potential bugs.

We follow blue/green upgrade model, when we migrate pods from blue to green ASG. This require to have blue and green asgs.

This soluiotn is not sufficient

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      desired_capacity
    ]
  }

This is a commont error code

β”‚ Error: waiting for Auto Scaling Group (eks-apm-enabled-spot-worker-ns-group1-1.27-asdfasdfasdfasdfasf) capacity satisfied: timeout while waiting for state to become 'ok' (last state: 'want exactly 44 healthy instance(s) in Auto Scaling Group, have 45', timeout: 10m0s)
 with module.eks_self_managed_node_group["apm-enabled-spot-worker-ns-group1-1.27"].aws_autoscaling_group.this[0],
403 β”‚   on .terraform/modules/eks_self_managed_node_group/modules/self-managed-node-group/main.tf line 491, in resource "aws_autoscaling_group" "this":
404 β”‚  491: resource "aws_autoscaling_group" "this" {

Describe the solution you'd like.

Add support for ignore_failed_scaling_activities it was added to aws provider a year+ ago.

Describe alternatives you've considered.

Change to our processes

  1. Do not run plan - apply stages, but apply only. Still fails
  2. Execute cluster upgrade as first step. Seconds step to create/update ASGs
  3. When cluster upgrade is happenting, to disable autoscaling for blue as well as green ASG node group

Additional context

ivankatliarchuk avatar Jul 15 '24 15:07 ivankatliarchuk