pulumi-eks
pulumi-eks copied to clipboard
Pulumi force initiates RollingUpdate while updating EKS nodeGroup ASG's launch configurations
Problem description
Whenever ASG's launch configurations are updated by pulumi, as part of that CFN nodes stacks update, Pulumi’s generated CFN template includes a rolling update configuration, without any explicit configuration from our end. Check “AutoScalingRollingUpdate” in this doc for CFN’s update policy : AutoScalingRollingUpdate Below is the CFN template Pulumi generates as part of the above list of CFN updates. As we can see, there is a “AutoScalingRollingUpdate” section in the template that drives CFN to initiate termination of nodes with older AMI. We would like this to be configurable, since we don't want the old nodes to be terminated before the pods are drained from them.
Here is the hard coded Rolling Update CFN template on Pulumi's end : CFN template
Errors & Logs
AWSTemplateFormatVersion: '2010-09-09'
Outputs:
NodeGroup:
Value: !Ref NodeGroup
Resources:
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: 5
LaunchConfigurationName: online-cd-main-default-worker-node-group-0-nodeLaunchConfiguration-5e7f9ff
MinSize: 1
MaxSize: 25
VPCZoneIdentifier: ["subnet-02f0bf13256db0423"]
Tags:
- Key: Name
Value: online-cd-main-eksCluster-68f6e20-worker
PropagateAtLaunch: 'true'
- Key: kubernetes.io/cluster/online-cd-main-eksCluster-68f6e20
Value: owned
PropagateAtLaunch: 'true'
- Key: k8s.io/cluster-autoscaler/enabled
Value: true
PropagateAtLaunch: 'true'
- Key: k8s.io/cluster-autoscaler/online-cd-main-eksCluster-68f6e20
Value: true
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: '1'
MaxBatchSize: '1'
Affected product version(s)
We use v1.14.1. But I don't think this is isolated to this version.
Reproducing the issue
You can reproduce by updating the AMI Id of any ASG's launch configuration.
Suggestions for a fix
Add a flag to the ClusterNodeGroupOptions to dictate whether updating the Launch configurations of ASGs require a rolling update.
@abaskar-tableau, can you elaborate on how you would like to configure this? Do you want a boolean flag to omit the UpdatePolicy section from the template? Or do you want an argument to pass your own fully-formed UpdatePolicy section?
@clstokes We would like to have a omit boolean flag to not include the UpdatePolicy. That serves our purpose to not have the nodes recycled as part of the LaunchConfiguration upgrade. Thanks.
We'd really appreciate it if you could share an ETA for this, as that will help us set expectations for our Node Patching objectives. Thanks.
@clstokes Hi Cameron, we would really appreciate an ETA on this, to set expectations for a feature delivery on our end. Thanks.
@abaskar-tableau I will try to get an answer for you here.
Here's a workaround using transformations to modify the CloudFormation template from @pulumi/eks, to remove the UpdatePolicy section.
Code
npm i js-yaml
npm i js-yaml-cloudformation-schema
import * as pulumi from "@pulumi/pulumi";
import * as eks from "@pulumi/eks";
import * as yaml from "js-yaml";
import { CLOUDFORMATION_SCHEMA } from "js-yaml-cloudformation-schema";
const cluster = new eks.Cluster("main", {
// ...
}, {
transformations: [
/**
* This transformation will parse the CloudFormation Stack's `templateBody`,
* remove the `UpdatePolicy section, and return the modified template.
*/
(args) => {
if (args.type === "aws:cloudformation/stack:Stack" && args.name.endsWith("-nodes")) {
pulumi.log.info(`Transformation: Removing 'UpdatePolicy' from [${args.type}] [${args.name}]`);
args.props["templateBody"] = args.props["templateBody"].apply(it => {
// read in the CFN template yaml
const templateBody = yaml.safeLoad(it, { schema: CLOUDFORMATION_SCHEMA });
// remove the `UpdatePolicy`
delete templateBody["Resources"]["NodeGroup"]["UpdatePolicy"];
// return the modified CFN template yaml
return yaml.dump(templateBody, { schema: CLOUDFORMATION_SCHEMA });
});
return {
props: args.props,
opts: args.opts,
}
}
// no modifications
return undefined;
}
]
});
Template after transformation
ℹ️ Note: Some of these difference are due to using js-yaml to load, modify, then dump the yaml.
AWSTemplateFormatVersion: '2010-09-09'
Outputs:
NodeGroup:
Value: !<!Ref> NodeGroup
Resources:
NodeGroup:
Type: 'AWS::AutoScaling::AutoScalingGroup'
Properties:
DesiredCapacity: 1
LaunchConfigurationName: demo-k8s-ts-cluster-nodeLaunchConfiguration-f97610b
MinSize: 1
MaxSize: 1
VPCZoneIdentifier:
- subnet-05bde34df5b96a423
- subnet-039d1922f39ef65bc
Tags:
- Key: Name
Value: demo-k8s-ts-cluster-eksCluster-d200f9f-worker
PropagateAtLaunch: 'true'
- Key: kubernetes.io/cluster/demo-k8s-ts-cluster-eksCluster-d200f9f
Value: owned
PropagateAtLaunch: 'true'
Template without transformation
AWSTemplateFormatVersion: '2010-09-09'
Outputs:
NodeGroup:
Value: !Ref NodeGroup
Resources:
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: 1
LaunchConfigurationName: demo-k8s-ts-cluster-nodeLaunchConfiguration-f97610b
MinSize: 1
MaxSize: 1
VPCZoneIdentifier: ["subnet-05bde34df5b96a423","subnet-039d1922f39ef65bc"]
Tags:
- Key: Name
Value: demo-k8s-ts-cluster-eksCluster-d200f9f-worker
PropagateAtLaunch: 'true'
- Key: kubernetes.io/cluster/demo-k8s-ts-cluster-eksCluster-d200f9f
Value: owned
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: '1'
MaxBatchSize: '1'
Thanks, I ll try this workaround and let you know.
Going to link this to #750 as part of that discussion / design