terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

feat: Enable update in place for node groups with cluster placement group strategy

Open Josephuss opened this issue 1 year ago • 3 comments

Description

When using a node group without EFA enabled and a placement group with cluster strategy, sometimes updates of the node group fail because the auto scaling group does not restrict the list of availability zones.

The following changes fixes this behaviour so that:

  • placement groups automatically restrict the subnets to a single A/Z
  • the A/Z is specifically defined by the user.

Motivation and Context

This fixes the issue mentioned in https://github.com/terraform-aws-modules/terraform-aws-eks/issues/3044

Deployment for nodegroups with placement groups are working, but they are subsequently difficult to update.

This is because the placement group restricts the available A/Z to a single one, and a full multiple A/Z group of subnets is still passed to the autoscaling group, leading to a conflict and messages like

│ Error: updating EKS Node Group version: operation error EKS: UpdateNodegroupVersion, https response error StatusCode: 400, RequestID: 58562857----********, InvalidRequestException: Instances in the Placement Group must be launched in the eu-west-1a Availability Zone. Specify the eu-west-1a Availability Zone and try again. │

Once implemented, there’s no need to put a specific overridden subnet references in our config like this:

    create_placement_group: true
    subnet_ids:
     - subnet-0fa7d************

Config changes can now be filtered like this:

    create_placement_group: true
    placement_group_az_filter: "eu-west-1a"

Breaking Changes

None

How Has This Been Tested?

  • [x] I have updated at least one of the examples/* to demonstrate and validate my change(s)
  • [x] I have tested and validated these changes using one or more of the provided examples/* projects
  • [x] I have executed pre-commit run -a on my pull request

Josephuss avatar May 23 '24 03:05 Josephuss

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 10 days

github-actions[bot] avatar Jun 23 '24 00:06 github-actions[bot]

Not stale - fixes a ASG + Cluster Placement Group interaction problem.

james-masson avatar Jun 25 '24 09:06 james-masson

Superseded by #3090

bdellegrazie avatar Jul 05 '24 15:07 bdellegrazie

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 10 days

github-actions[bot] avatar Aug 05 '24 00:08 github-actions[bot]

This PR is included in version 20.22.0 :tada:

antonbabenko avatar Aug 05 '24 15:08 antonbabenko

@bryantbiggs thank you!

bdellegrazie avatar Aug 05 '24 15:08 bdellegrazie

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Sep 07 '24 02:09 github-actions[bot]