pulumi-eks
pulumi-eks copied to clipboard
Support for Windows AMI family
Hi, Do you plan to support Node Groups using a "windows server" AMI family?
It seems that there's nothing specifically preventing the Cluster type from having windows nodes - it's just that the NodeGroup type has linux stuff hard coded, most notably the user data for the launch config that is used to join the cluster (it's a hard coded bash script). It would be really useful if it were possible to override the launch config's user data and I think it would then be possible (not saying trivial though) to add a windows node group.
To support Windows node groups we'll have to refactor how the cluster and nodegroup classes function.
Specifically, we'll have to cue off a cluster toggle that windows is being used to prep the cluster and nodegroups accordingly, as an EKS cluster with Windows support must have both Linux and Windows node groups to function.
Per AWS:
- The cluster class will need to install the following at provisioning time, along with having
opensslandjqavailable on the client machine to run the scripts:- [ ] AWS VPC resource controller (manifest image varies by region)
- [ ] AWS VPC admission controller webhooks (manifest image varies by region)
- [ ] Use cert signing shell script to create and submit a CSR to the cluster, approve it through
kubectl, and create a Secret from its contents - [ ] Use cert bundle patching shell script to patch webhook manifest with the cluster's CA bundle.
- [ ] Add a new cluster role binding for
kube-proxy - [ ] Deploy the patched admission webhook
- [ ] Use cert signing shell script to create and submit a CSR to the cluster, approve it through
- [ ] Requires a windows & linux hybrid cluster. Specifically, 1+ more Linux nodes need to be available to run system Pods like
corednsand the VPC resource controller.
- The nodegroup class will need to adjust and account for:
- [ ] Distinct default root disk volume sizes
- [ ] Distinct default latest AMI's
- [ ] UserData based on Powershell to configure the windows nodes
- Related, but not required:
- Not all instance types work for Linux and Windows, though we do not enforce these today for Linux.
- For more details, see the Linux and Windows CloudFormation Stacks for comparison
This looks accurate. I have been able to do all this using pulumi. The only thing that I had to do manually was to create the CSR and approve it. I then supplied the resulting key pair as secret config to my pulumi program. The following is what I did using pulumi:
- created the required secret for the VPC admission webhook using the key pair config
- created the VPC resource controller and admission webhook by downloading the correct yaml files for my region from AWS, patching them and provisioning them using
ConfigFileinstances - created the required cluster role binding
- created roles and instance profiles for both my linux and windows node groups
- created the
ClusterinstanceskipDefaultNodeGroup: trueroleMappings: [ { roleArn: linuxInstanceRole.apply((r) => r.arn), username: "system:node:{{EC2PrivateDNSName}}", groups: ["system:bootstrappers", "system:nodes"], }, { roleArn: windowsInstanceRole.apply((r) => r.arn), username: "system:node:{{EC2PrivateDNSName}}", groups: ["system:bootstrappers", "system:nodes", "eks:kube-proxy-windows"], }, ]
- created a node security group using
createNodeGroupSecurityGroupfrom the eks package - created a new
WindowsNodeGroupclass that's a copy of the pulumi suppliedNodeGroupclass except- it has the required powershell user data
- it requires the AMI to be supplied via args (no lookup)
- created one
NodeGroupinstance for the linux nodes and oneWindowsNodeGroupinstance for the windows nodes. They are injected each with their own instance profile but the same security group
This is definitely not ideal but it was required because there was no way for me to override the hard coded bash script for user data. Had I been able to inject the user data, I would have been able to make this work with pulumi out of the box. It is not a particularly nice experience, but I think it would be relatively easy to supply a nice API on top of the existing APIs to compose the required things for a mixed os cluster (e.g. a MixedOsCluster class). The only "not-so-nice" requirement left would be having to create the CSR out of band and supply the key pair as config but I don't think that pulumi can approve CSRs (correct me if I'm wrong please) but it also takes the requirement off the client machine having to have openssl.
Thanks for the validation and detailed walkthrough @gunniwho, this insight is very helpful!
We've opened up #428 to track overriding the nodegroup userdata script as start to supporting Windows nodegroups.
This looks accurate. I have been able to do all this using pulumi. The only thing that I had to do manually was to create the CSR and approve it. I then supplied the resulting key pair as secret config to my pulumi program. The following is what I did using pulumi:
created the required secret for the VPC admission webhook using the key pair config
created the VPC resource controller and admission webhook by downloading the correct yaml files for my region from AWS, patching them and provisioning them using
ConfigFileinstancescreated the required cluster role binding
created roles and instance profiles for both my linux and windows node groups
created the
Clusterinstance
skipDefaultNodeGroup: trueroleMappings: [ { roleArn: linuxInstanceRole.apply((r) => r.arn), username: "system:node:{{EC2PrivateDNSName}}", groups: ["system:bootstrappers", "system:nodes"], }, { roleArn: windowsInstanceRole.apply((r) => r.arn), username: "system:node:{{EC2PrivateDNSName}}", groups: ["system:bootstrappers", "system:nodes", "eks:kube-proxy-windows"], }, ]created a node security group using
createNodeGroupSecurityGroupfrom the eks packagecreated a new
WindowsNodeGroupclass that's a copy of the pulumi suppliedNodeGroupclass except
- it has the required powershell user data
- it requires the AMI to be supplied via args (no lookup)
created one
NodeGroupinstance for the linux nodes and oneWindowsNodeGroupinstance for the windows nodes. They are injected each with their own instance profile but the same security groupThis is definitely not ideal but it was required because there was no way for me to override the hard coded bash script for user data. Had I been able to inject the user data, I would have been able to make this work with pulumi out of the box. It is not a particularly nice experience, but I think it would be relatively easy to supply a nice API on top of the existing APIs to compose the required things for a mixed os cluster (e.g. a
MixedOsClusterclass). The only "not-so-nice" requirement left would be having to create the CSR out of band and supply the key pair as config but I don't think that pulumi can approve CSRs (correct me if I'm wrong please) but it also takes the requirement off the client machine having to have openssl.
@gunniwho Thanks for detailing these steps. I kinda did the same but a little differently . I created a linux cluster via pulumi installed /prerequisites as per aws documentation ( certificates , secrets etc..) In my pulumi in the next apply I added a new windows nodegroup but with different userdata with powershell script .
But my windows nodes are not registering to control plane. My code can be found here https://github.com/bit-cloner/poke
would it be possible to share your pulumi code ?
Thank you