terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Unable to update Node Groups in place with cluster placement group strategy without EFA

Open Josephuss opened this issue 1 year ago β€’ 3 comments

Description

When using a node group without EFA enabled and a placement group with cluster strategy, updates of the node group fail because the auto scaling group does not restrict the list of availability zones.

The node group with cluster placement successfully gets deployed into a single AZ of the 3 subnets configured with no errors, However, replacing or upgrading the node group fails as listed below due to the availability zone is not filtered and the update does not take place unless the subnet id is overridden in the configs.

  • [x] βœ‹ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: 20.8.5

  • Terraform version: 1.5.7

  • Provider version(s): 5.40.0

Reproduction Code

This is a copy of the managed node group example with a placement group created. The default placement group strategy is cluster.

module "eks_managed_cluster_node_group" {
  source = "../../modules/eks-managed-node-group"

  name                 = "managed-cluster-node-group"
  cluster_name         = module.eks.cluster_name
  cluster_ip_family    = module.eks.cluster_ip_family
  cluster_service_cidr = module.eks.cluster_service_cidr

  subnet_ids                        = module.vpc.private_subnets
  cluster_primary_security_group_id = module.eks.cluster_primary_security_group_id
  vpc_security_group_ids = [
    module.eks.node_security_group_id,
  ]
  
  ami_type = "BOTTLEROCKET_x86_64"
  platform = "bottlerocket"

  create_placement_group = true

  # this will get added to what AWS provides
  bootstrap_extra_args = <<-EOT
    # extra args added
    [settings.kernel]
    lockdown = "integrity"

    [settings.kubernetes.node-labels]
    "label1" = "foo"
    "label2" = "bar"
  EOT

  tags = merge(local.tags, { Separate = "eks-managed-node-group" })
}

Steps to reproduce the behavior:

  1. Create a managed node group of size 1 and create a placement group using cluster strategy (default).
  2. Increase the size to 2.

Expected behavior

Node group is increased as requested.

Actual behavior

Node group update may fail with:

Error: updating EKS Node Group version: operation error EKS: UpdateNodegroupVersion, https response error StatusCode: 400, RequestID: 58562857----********, InvalidRequestException: Instances in the Placement Group must be launched in the eu-west-1c Availability Zone. Specify the eu-west-1c Availability Zone and try again.

Additional context

  • EFA is not enabled
  • Placement group is created
  • Placement group strategy is cluster by default

Josephuss avatar May 22 '24 10:05 Josephuss

cc @james-masson ref #2959

bryantbiggs avatar May 22 '24 20:05 bryantbiggs

@bryantbiggs

  1. This is a general problem we've seen with using cluster placement groups (EFA potentially included)
  2. This could actually be seen as a more general enhancement where you want a different AZ layout on a single node group.

Consider an EKS cluster deployed across 3 availability zones.

A nodegroup that makes use of a "cluster" placement group will only be able to deploy into a single one of these AZs. That's the point of the "cluster" placement group, to put the instances into the same physical rack.

It turns out that on initial deployment, this configuration is not a problem. The nodegroup with cluster placement successfully gets deployed into a single AZ of the 3 subnets configured with no errors. Arguably this itself is a bug.

However, when it comes time to replace or upgrade the nodegroup, you get the error listed by my colleague @Josephuss For some reason the A/Z or subnet deployment problem only happens on replacement.

This PR tries to fix this issue by adding the concept of a AZ filter into the nodegroups, to allow nodegroups to be deployed into a subset of the subnets that the rest of the cluster is configured with.

We're aware that this can be currently worked around by overriding subnet ids.

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/node_groups.tf#L308

eg.


  eks_managed_node_groups = {
    my_custom_nodegroup = {
      name            = "customer1"
      subnet_ids = [ "subnet-abc12345" ]
    }
  }

But this is quite fragile, because:

  1. We frequently do full create/destroy cycles - so these references change.
  2. We expose nodegroup config to our internal customers as a variable - so they can declare their own requirements. This precludes allowing direct terraform references like module.vpc.my_subnets_in_az_x

Hence seeing the general value in an interface like this - it's more practical, understandable and portable, and it has value outside of placement groups too.


  eks_managed_node_groups = {
    my_custom_nodegroup = {
      name            = "customer1"
      subnet_az_filter = "eu-west-1a"
    }
  }

james-masson avatar May 23 '24 09:05 james-masson

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Jun 23 '24 00:06 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar Jul 04 '24 00:07 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Aug 03 '24 02:08 github-actions[bot]