pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

Pulumi force initiates RollingUpdate while updating EKS nodeGroup ASG's launch configurations

Open abaskar-tableau opened this issue 4 years ago • 8 comments
trafficstars

Problem description

Whenever ASG's launch configurations are updated by pulumi, as part of that CFN nodes stacks update, Pulumi’s generated CFN template includes a rolling update configuration, without any explicit configuration from our end. Check “AutoScalingRollingUpdate” in this doc for CFN’s update policy : AutoScalingRollingUpdate Below is the CFN template Pulumi generates as part of the above list of CFN updates. As we can see, there is a “AutoScalingRollingUpdate” section in the template that drives CFN to initiate termination of nodes with older AMI. We would like this to be configurable, since we don't want the old nodes to be terminated before the pods are drained from them.

Here is the hard coded Rolling Update CFN template on Pulumi's end : CFN template

Errors & Logs

          AWSTemplateFormatVersion: '2010-09-09'
            Outputs:
                NodeGroup:
                    Value: !Ref NodeGroup
            Resources:
                NodeGroup:
                    Type: AWS::AutoScaling::AutoScalingGroup
                    Properties:
                      DesiredCapacity: 5
                      LaunchConfigurationName: online-cd-main-default-worker-node-group-0-nodeLaunchConfiguration-5e7f9ff
                      MinSize: 1
                      MaxSize: 25
                      VPCZoneIdentifier: ["subnet-02f0bf13256db0423"]
                      Tags:
                      - Key: Name
                        Value: online-cd-main-eksCluster-68f6e20-worker
                        PropagateAtLaunch: 'true'
                      - Key: kubernetes.io/cluster/online-cd-main-eksCluster-68f6e20
                        Value: owned
                        PropagateAtLaunch: 'true'
                      - Key: k8s.io/cluster-autoscaler/enabled
                        Value: true
                        PropagateAtLaunch: 'true'
                      - Key: k8s.io/cluster-autoscaler/online-cd-main-eksCluster-68f6e20
                        Value: true
                        PropagateAtLaunch: 'true'
                    UpdatePolicy:
                      AutoScalingRollingUpdate:
                        MinInstancesInService: '1'
                        MaxBatchSize: '1'

Affected product version(s)

We use v1.14.1. But I don't think this is isolated to this version.

Reproducing the issue

You can reproduce by updating the AMI Id of any ASG's launch configuration.

Suggestions for a fix

Add a flag to the ClusterNodeGroupOptions to dictate whether updating the Launch configurations of ASGs require a rolling update.

abaskar-tableau avatar Feb 08 '21 20:02 abaskar-tableau

@abaskar-tableau, can you elaborate on how you would like to configure this? Do you want a boolean flag to omit the UpdatePolicy section from the template? Or do you want an argument to pass your own fully-formed UpdatePolicy section?

clstokes avatar Feb 08 '21 20:02 clstokes

@clstokes We would like to have a omit boolean flag to not include the UpdatePolicy. That serves our purpose to not have the nodes recycled as part of the LaunchConfiguration upgrade. Thanks.

abaskar-tableau avatar Feb 08 '21 20:02 abaskar-tableau

We'd really appreciate it if you could share an ETA for this, as that will help us set expectations for our Node Patching objectives. Thanks.

abaskar-tableau avatar Feb 08 '21 21:02 abaskar-tableau

@clstokes Hi Cameron, we would really appreciate an ETA on this, to set expectations for a feature delivery on our end. Thanks.

abaskar-tableau avatar Feb 09 '21 22:02 abaskar-tableau

@abaskar-tableau I will try to get an answer for you here.

clstokes avatar Feb 09 '21 23:02 clstokes

Here's a workaround using transformations to modify the CloudFormation template from @pulumi/eks, to remove the UpdatePolicy section.

Code

npm i js-yaml
npm i js-yaml-cloudformation-schema
import * as pulumi from "@pulumi/pulumi";
import * as eks from "@pulumi/eks";

import * as yaml from "js-yaml";
import { CLOUDFORMATION_SCHEMA } from "js-yaml-cloudformation-schema";

const cluster = new eks.Cluster("main", {
    // ...
}, {
    transformations: [
        /** 
         * This transformation will parse the CloudFormation Stack's `templateBody`, 
         * remove the `UpdatePolicy section, and return the modified template.
         */
        (args) => {
            if (args.type === "aws:cloudformation/stack:Stack" && args.name.endsWith("-nodes")) {
                pulumi.log.info(`Transformation: Removing 'UpdatePolicy' from [${args.type}] [${args.name}]`);
                args.props["templateBody"] = args.props["templateBody"].apply(it => {
                    // read in the CFN template yaml
                    const templateBody = yaml.safeLoad(it, { schema: CLOUDFORMATION_SCHEMA });
                    // remove the `UpdatePolicy`
                    delete templateBody["Resources"]["NodeGroup"]["UpdatePolicy"];
                    // return the modified CFN template yaml
                    return yaml.dump(templateBody, { schema: CLOUDFORMATION_SCHEMA });
                });

                return {
                    props: args.props,
                    opts: args.opts,
                }
            }
            // no modifications
            return undefined;
        }
    ]
});

Template after transformation

ℹ️ Note: Some of these difference are due to using js-yaml to load, modify, then dump the yaml.

AWSTemplateFormatVersion: '2010-09-09'
Outputs:
  NodeGroup:
    Value: !<!Ref> NodeGroup
Resources:
  NodeGroup:
    Type: 'AWS::AutoScaling::AutoScalingGroup'
    Properties:
      DesiredCapacity: 1
      LaunchConfigurationName: demo-k8s-ts-cluster-nodeLaunchConfiguration-f97610b
      MinSize: 1
      MaxSize: 1
      VPCZoneIdentifier:
        - subnet-05bde34df5b96a423
        - subnet-039d1922f39ef65bc
      Tags:
        - Key: Name
          Value: demo-k8s-ts-cluster-eksCluster-d200f9f-worker
          PropagateAtLaunch: 'true'
        - Key: kubernetes.io/cluster/demo-k8s-ts-cluster-eksCluster-d200f9f
          Value: owned
          PropagateAtLaunch: 'true'

Template without transformation

AWSTemplateFormatVersion: '2010-09-09'
Outputs:
  NodeGroup:
    Value: !Ref NodeGroup
Resources:
  NodeGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 1
      LaunchConfigurationName: demo-k8s-ts-cluster-nodeLaunchConfiguration-f97610b
      MinSize: 1
      MaxSize: 1
      VPCZoneIdentifier: ["subnet-05bde34df5b96a423","subnet-039d1922f39ef65bc"]
      Tags:
      - Key: Name
        Value: demo-k8s-ts-cluster-eksCluster-d200f9f-worker
        PropagateAtLaunch: 'true'
      - Key: kubernetes.io/cluster/demo-k8s-ts-cluster-eksCluster-d200f9f
        Value: owned
        PropagateAtLaunch: 'true'
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MinInstancesInService: '1'
        MaxBatchSize: '1'

clstokes avatar Feb 10 '21 06:02 clstokes

Thanks, I ll try this workaround and let you know.

abaskar-tableau avatar Feb 10 '21 17:02 abaskar-tableau

Going to link this to #750 as part of that discussion / design

stack72 avatar Jul 29 '22 14:07 stack72