terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Windows Managed Node Group support

Open bryantbiggs opened this issue 2 years ago β€’ 17 comments

Is your request related to a new offering from AWS?

  • Yes
    • https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-eks-automated-provisioning-lifecycle-management-windows-containers/
    • https://github.com/aws/containers-roadmap/issues/584#issuecomment-1353809239

Is your request related to a problem? Please describe.

Describe the solution you'd like.

  • Ability to create EKS managed node groups with Windows based nodes

Describe alternatives you've considered.

Additional context

  • Tangentially to this change, we should evaluate if we can better accommodate cloud-config in user-data since we will be making changes to support Windows based instances now
    • https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2335

bryantbiggs avatar Dec 17 '22 21:12 bryantbiggs

Requires the Terraform aws-sdk version to be updated https://github.com/hashicorp/terraform-provider-aws/issues/28438

bryantbiggs avatar Dec 17 '22 23:12 bryantbiggs

Any update on this?

chandrasekharkolla avatar Jan 10 '23 20:01 chandrasekharkolla

There aren't any code changes required so you can in theory use it today, but we will be adding an example and checking to see how it aligns with the rest of the Linux AL2 and Bottlerocket OS usage

bryantbiggs avatar Jan 10 '23 20:01 bryantbiggs

If you want to create a windows managed node group using this module, I can confirm that on version 18.31.2 you can specify the following for a windows eks managed node group as long as the following requirements are fulfilled.

Requirements

  • Your AWS Terraform Provider is at least version v4.48.0 to allow you to pass in the correct AMI_TYPE for Windows EKS Managed Node Group Instances.
  • You already Have a linux EKS Node Group and nodes on your cluster. I confirmed with AWS Support you're not able to run just a windows EKS Cluster so you need to already have a linux node in place to launch any windows nodes via the Managed Node Group option.
  • Your EKS node Role has the policy AmazonEKSVPCResourceController, which it should if you use this module since it's here;
  • You have enabled the Windows support by adding the configmap:
apiVersion: v1
data:
  enable-windows-ipam: "true"
immutable: false
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system

Example

eks_managed_node_groups = {
  windows = {
    min_size          = 1
    desired_size      = 1
    max_size          = 5
    platform          = "windows"
    ami_type          = "WINDOWS_CORE_2019_x86_64"
    capacity_type     = "SPOT"
    enable_monitoring = true
    disk_size         = "100"
    use_name_prefix   = true
    cluster_version   = var.aws_eks_cluster_version
    instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
    taints = [
      {
        key    = "os"
        value  = "windows"
        effect = "NO_SCHEDULE"
      }
    ]
  },
},

sebas-w avatar Feb 06 '23 21:02 sebas-w

thank you for sharing @sebas-w !

bryantbiggs avatar Feb 06 '23 22:02 bryantbiggs

@sebas-w Thank you for sharing an example! I was able to create windows managed node pool as you described above and run a test pod on it. However, I'm unable to connect to any pod via the cluster's internal network. Access to other resources in VPC or the internet works without issue (except for obvious DNS resolution problems). Did you have such problems?

enver avatar Feb 10 '23 10:02 enver

@sebas-w This does indeed work unless you set var.manage_aws_auth_configmap = true. If that var is enabled then the module overwrites aws-auth configmap values set by EKS and in the process removes the eks:kube-proxy-windows line from the Windows node group in the aws-auth configmap.

local.node_iam_role_arns_windows currently does not look at module.eks_managed_node_groups to determine if platform == "windows". So the module assumes MNGs are Linux or Bottlerocket and that line in the config is removed.

When var.manage_aws_auth_configmap = false:

mapRoles: |
  - "groups":
    - "eks:kube-proxy-windows"
    - "system:bootstrappers"
    - "system:nodes"
    "rolearn": "<windows_mng_role_arn>"
    "username": "system:node:{{EC2PrivateDNSName}}"

When var.manage_aws_auth_configmap = true:

mapRoles: |
  - "groups":
    - "system:bootstrappers"
    - "system:nodes"
    "rolearn": "<windows_mng_role_arn>"
    "username": "system:node:{{EC2PrivateDNSName}}"

aamoctz avatar Feb 13 '23 20:02 aamoctz

Has any work started related to this issue? I have some changes I can contribute to at least resolve the issue with manage_aws_auth_configmap removing eks:kube-proxy-windows, but if there's already work in progress I would rather not step on anyone's toes on this.

aamoctz avatar Feb 14 '23 19:02 aamoctz

https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477

see this PR if someone can help push it pls

noamgreen avatar Feb 28 '23 13:02 noamgreen

If you want to create a windows managed node group using this module, I can confirm that on version 18.31.2 you can specify the following for a windows eks managed node group as long as the following requirements are fulfilled.

Requirements

  • Your AWS Terraform Provider is at least version v4.48.0 to allow you to pass in the correct AMI_TYPE for Windows EKS Managed Node Group Instances.
  • You already Have a linux EKS Node Group and nodes on your cluster. I confirmed with AWS Support you're not able to run just a windows EKS Cluster so you need to already have a linux node in place to launch any windows nodes via the Managed Node Group option.
  • Your EKS node Role has the policy AmazonEKSVPCResourceController, which it should if you use this module since it's here;
  • You have enabled the Windows support by adding the configmap:
apiVersion: v1
data:
  enable-windows-ipam: "true"
immutable: false
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system

Example

eks_managed_node_groups = {
  windows = {
    min_size          = 1
    desired_size      = 1
    max_size          = 5
    platform          = "windows"
    ami_type          = "WINDOWS_CORE_2019_x86_64"
    capacity_type     = "SPOT"
    enable_monitoring = true
    disk_size         = "100"
    use_name_prefix   = true
    cluster_version   = var.aws_eks_cluster_version
    instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
    taints = [
      {
        key    = "os"
        value  = "windows"
        effect = "NO_SCHEDULE"
      }
    ]
  },
},

I'm following this example but the vpc-admission controller is not created. I see the AmazonEKSVPCResourceController role on the clusterrole that was created.

Am I missing something else?

trippinnik avatar Mar 24 '23 15:03 trippinnik

Hi, I want to thank @sebas-w and @aamoctz, i was facing the same problems.

I started from version 18.31.2, already having Linux managed node groups, EKS 1.22, platform version eks.10." Then I set the AWS Terraform provider to 4.48 version and I created the amazon-vpc-cni configMap.

resource "kubernetes_config_map" "amazon_vpc_cni" {
  metadata {
    name      = "amazon-vpc-cni"
    namespace = "kube-system"
  }

  data = {
    enable-windows-ipam = "true"
  }
}

In the definition of the node group I just specified the platform and the ami:

myManagedNodeGroup =  {
      name         = "my-managed-node-group"
      platform     = "windows"
      ami_type     = "WINDOWS_CORE_2019_x86_64"
      ...
}

The node group was created, then I made changes to the module that builds EKS to correctly update the auth-conf configMap. I then later saw that @aamoctz has already proposed them here: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477

In main.tf

 ...
 node_iam_role_arns_non_windows = distinct(
    compact(
      concat(
        [for group in module.eks_managed_node_group : group.iam_role_arn if group.platform != "windows"],
        [for group in module.self_managed_node_group : group.iam_role_arn if group.platform != "windows"],
        var.aws_auth_node_iam_role_arns_non_windows,
      )
    )
  )

  node_iam_role_arns_windows = distinct(
    compact(
      concat(
        [for group in module.eks_managed_node_group : group.iam_role_arn if group.platform == "windows"],
        [for group in module.self_managed_node_group : group.iam_role_arn if group.platform == "windows"],
        var.aws_auth_node_iam_role_arns_windows,
      )
    )
  )
  ...

In modules/eks-managed-node-group/outputs.tf

output "platform" {
  description = "Identifies if the OS platform is `bottlerocket`, `linux`, or `windows` based"
  value       = var.platform
}

If it can be useful I add that to avoid the "failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address" error when scheduling a pod it is also important to set the appropriate nodeSelector:

nodeSelector:
     kubernetes.io/os: windows

I confirm that in this way I was able to correctly create a Windows node group, apply a test deployment and automatically scale the replicas and therefore the number of nodes.

Surely as soon as the module supports the mentioned modifications it will be very useful.

robertobandini avatar Apr 01 '23 13:04 robertobandini

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar May 02 '23 00:05 github-actions[bot]

Is there anything that can be done to help get the associated PR reviewed and merged? It looks like it should solve this issue, which is a reasonably big impediment to working with working with windows nodes in EKS.

davidedmondsMPG avatar May 26 '23 11:05 davidedmondsMPG

Bump for updates... Can we get this PR merged?

Is there anything that can be done to help get the associated PR reviewed and merged? It looks like it should solve this issue, which is a reasonably big impediment to working with working with windows nodes in EKS.

mlschindler avatar Jun 05 '23 14:06 mlschindler

https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2477#issuecomment-1570706923

bryantbiggs avatar Jun 05 '23 14:06 bryantbiggs

With the merge of #2477 does this make it possible to have the module provision EKS managed windows nodes?

mlschindler avatar Aug 08 '23 16:08 mlschindler

you can deploy Windows nodes with this module - but you will need to use the default launch template provided by EKS or provide your own launch template or user data when using a custom launch template. As I stated here, #2477 only addresses one small part of this, which is maintaining the IAM role mapping in the aws-auth configmap

The Windows node support currently does not match that of AL2 and Bottlerocket in terms of native custom launch template and user data support

bryantbiggs avatar Aug 08 '23 19:08 bryantbiggs

This issue has been resolved in version 20.0.0 :tada:

antonbabenko avatar Feb 02 '24 14:02 antonbabenko

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Mar 04 '24 02:03 github-actions[bot]