pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

Custom launch template is not used when creating a new Managed Node Group

Open aureq opened this issue 4 years ago • 20 comments

Hello!

  • Vote on this issue by adding a 👍 reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

When creating a new Managed Node Group, I specified a custom (ec2) launch template via launchTemplate.

Though, newly launched EC2 instances do not appear to be using this launch template since the EC2 instance tag name aws:ec2launchtemplate:id refers to the one created by this provider instead.

Steps to reproduce

  1. Use https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups as a starting point
  2. Create a new launch template as part of your code
const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tags: {testTag: "tag value"},
});
  1. Set the launch template for the managed node group like this
	...
	launchTemplate: {
		id: launchTemplate.id,
		version: '$Latest'
	}
  1. Deploy your changes

Expected: The custom launch template is used to launch new EC2 instances. Actual: The default launch template created by this provider is used.

aureq avatar Nov 10 '21 06:11 aureq

id is an Output, so you need to interpolate it:

id: pulumi.interpolate`${launchTemplate.id}`

More context:

https://www.pulumi.com/registry/packages/aws/api-docs/ec2/launchtemplate/#outputs https://www.pulumi.com/docs/intro/concepts/inputs-outputs/#outputs-and-strings

It is not recommended to use '$Latest' for the launch template version because the AWS API will this as 1 and parse it as drift every time, causing Pulumi to delete-replace it.

con5cience avatar Dec 01 '21 03:12 con5cience

const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tags: {testTag: "tag value"},
});

This tags the launch template, but does not tag the instances created by the launch template. To tag the instances created by the launch template, you can do:

const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tagSpecifications: [
        { resourceType: "instance", tags: { testTag: "tag value" } },
    ],
});

lukehoban avatar Dec 23 '21 19:12 lukehoban

Reopening this with a question from the community Slack:

Attaching a custom LaunchTemplate to an EKS ManagedNodeGroup doesn't seem to work? For example, following this: https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups. I create a new LaunchTemplate with some metadata options and a key pair, refer to it in the eks.createManagedNodeGroup() args:

launchTemplate: {
 id: pulumi.interpolate`${myLaunchTemplate.id}`,
 version: "1",
},

When the node group comes up, it says on the EKS page that it's using mine, but on the instances themselves in the ASG, it's using an auto-created one. Is this a bug? Or am I missing something fundamental?

nimbinatus avatar Mar 04 '22 22:03 nimbinatus

Same issue:

  const localCluster = new eks.Cluster(`localCluster`, {
    name: `localCluster`,
    version: "1.21",
    vpcId: vpc.id,
    publicSubnetIds: vpc.publicSubnetIds,
    privateSubnetIds: vpc.privateSubnetIds,
    nodeAssociatePublicIpAddress: false,
    endpointPrivateAccess: true,
    endpointPublicAccess: true,
    createOidcProvider: true,
    clusterSecurityGroup: apiSg,
    skipDefaultNodeGroup: true,
    providerCredentialOpts: {
      profileName: aws.config.profile,
    },
  },);

  const localEKSLaunchTemplate = new aws.ec2.LaunchTemplate(`localEKSLaunchTemplate`, {
    metadataOptions: {
      httpEndpoint: "enabled",
      httpTokens: "required",
      httpPutResponseHopLimit: 2,
    },
    keyName: keyName,
    defaultVersion: 1,
  })

  const localClusterMNG = new eks.ManagedNodeGroup(`localClusterMNG`, {
    version: "1.21",
    cluster: localCluster,
    nodeRole: localCluster.core.instanceRoles[0],
    subnetIds: vpc.privateSubnetIds,
    scalingConfig: {
      minSize: 1,
      desiredSize: 2,
      maxSize: 25,
    },
    launchTemplate: {
      id: localEKSLaunchTemplate.id,
      version: pulumi.interpolate`${localEKSLaunchTemplate.latestVersion}`,
    },
  }, {ignoreChanges: ["scalingConfig"]})

The launch template is created, and on the EKS dashboard it says it's being used for the node group, however when looking at the actual EC2 instances / ASG that are part of the node group, they all show the default EKS launch template.

johnharris85 avatar Mar 24 '22 21:03 johnharris85

Stumbled upon this issue. I have been developing using python and the following function works for me. Hopefully this helps people resolve their issues. Difference here being I associate an EKS AMI and the SG created by the cluster.

def create_launch_template(stack, cluster, node_group, k8s_version):
    ami_id = fetch_latest_ami_id(k8s_version)

    launch_template_name = f"{stack}-{node_group.get('name')}-lt"
    eks_sg = cluster.core.cluster.vpc_config.cluster_security_group_id

    complete_user_data = (
        user_data.SCRIPT_FORMAT
        + node_group.get("bootstrap_commands")
        + user_data.SCRIPT_BOUNDARY_END
        + user_data.BASE_USER_DATA
    )

    launch_template_device_mapping_args = LaunchTemplateBlockDeviceMappingArgs(
        device_name="/dev/xvda",
        ebs=LaunchTemplateBlockDeviceMappingEbsArgs(
            volume_size=100,
        ),
    )

    tag_pairs = {
        "eks_cluster": cluster.eks_cluster.name,
        "launch_template_name": launch_template_name,
        "node_group": node_group.get("name"),
        "Stack": stack,
    }

    logger.info(f"iam#create_launch_template Creating Launch Template {launch_template_name}")
    launch_template = LaunchTemplate(
        launch_template_name,
        name=launch_template_name,
        block_device_mappings=[launch_template_device_mapping_args],
        user_data=format_user_data(cluster, complete_user_data),
        image_id=ami_id,
        vpc_security_group_ids=[eks_sg],
        tags=tag_pairs,
        tag_specifications=[
            LaunchTemplateTagSpecificationArgs(
                resource_type="instance",
                tags=tag_pairs,
            )
        ],
    )

    return launch_template

sushantkumar-amagi avatar Mar 25 '22 05:03 sushantkumar-amagi

Thanks @sushantkumar-amagi:

  • Is this a managed node group? (curious if that makes a difference)
  • Where are you actually specifying the node group should use the launch template? (I don't see that above)
  • The node group / EC2 instances that are created are definitely using this launch template? (mine shows that it's using it, but when looking at the instances it actually isn't)

I see that you're using tags, and @lukehoban talks about tags in his message above, but are tags a necessary piece of getting this to work? I can't think why that would be the case, and can't find anything in the AWS docs about that? (although certainly happy to be wrong :smile: )

johnharris85 avatar Mar 25 '22 23:03 johnharris85

HI @johnharris85

  • This is a managed node group. Created using pulumi_eks package itself.
  • This is how I am creating the node groups
def create_node_group(node_group, stack, cluster, ec2_role, k8s_version):
    launch_template = launch_templates.create_launch_template(
        stack, cluster, node_group, k8s_version
    )

    # If $Default or $Latest is used as a version, then every time the stack is updated
    # it shows a diff and deletes-replaces the nodegroup
    launch_template_args = NodeGroupLaunchTemplateArgs(
        version=launch_template.latest_version, id=launch_template.id
    )

    taints = []
    for taint in node_group.get("taints", []):
        taint = NodeGroupTaintArgs(
            effect=taint.get("effect"), key=taint.get("key"), value=taint.get("value")
        )
        taints.append(taint)

    nodegroup_scaling_args = NodeGroupScalingConfigArgs(
        desired_size=node_group.get("capacities").get("desired"),
        max_size=node_group.get("capacities").get("max"),
        min_size=node_group.get("capacities").get("min"),
    )
    nodegroup_azs = get_subnet_ids(node_group.get("az", []))

    tag_args = {"Name": node_group.get("name"), "Stack": stack}
    tag_args.update(node_group.get("tags"))

    logger.info(f"iam#create_node_group Creating EKS NodeGroup {node_group.get('name')}")
    eks_node_group = ManagedNodeGroup(
        node_group.get("name"),
        node_group_name=node_group.get("name"),
        subnet_ids=nodegroup_azs,
        cluster=cluster.core,
        capacity_type=node_group.get("capacities").get("type"),
        taints=taints,
        instance_types=node_group.get("instance_types"),
        node_role_arn=ec2_role.arn,
        scaling_config=nodegroup_scaling_args,
        launch_template=launch_template_args,
        tags=tag_args
    )

    return eks_node_group

Reason for this function to exist is because I am going to be looping over a number of node group configurations.

  • I was able to check if the launch template was applied by logging into the instance and curling the userdata i.e curl http://169.254.169.254/latest/user-data. I havent handed it over to the Dev team yet so not a 100% confirmation but the tags and userdata script seem to be in line with what I am expecting.

Also I dont think tags are absolutely essential for this to work, infact I had forgotten to tag them before coming across this issue.

sushantkumar-amagi avatar Mar 28 '22 07:03 sushantkumar-amagi

Thanks for the response @sushantkumar-amagi. OK so I've done some more testing with this, and it's actually pretty weird (or maybe I'm misunderstanding how EKS does Node Groups / custom launch templates?)

I create an EKS cluster with 2 MNGs. One specifies a launch template, the other doesn't. Both of the MNGs get created. In the EKS console I can see that MNG 1 is using my custom LT, MNG 2 has no LT. So far so good.

Now when I visit the Autoscaling groups for the nodes for each MNG, the ASG for MNG 1 has a launchtemplate that is created by Pulumi/EKS, not my custom one. However, the auto-created / attached LT does have the configuration from my custom LT (SSH key, other settings, etc...). Maybe it's copied over during creation? So the whole process is obviously aware of my LT. This is fine for a one shot, but if I ever want to go and update the LT in pulumi and apply it then it will have no effect as the ASGs are using the auto-created LT with the configuration from my original run of the custom LT creation.

I wonder if others are actually hitting this issue and they're just not noticing because the config works as expected (copied over) and they never update their original LT so don't notice changes aren't being propagated?

johnharris85 avatar Apr 01 '22 15:04 johnharris85

Hello,

EKS copies the launch template one gives to him (it seems to add some default settings): https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html

Managed node groups are always deployed with a launch template to be used with the Amazon EC2 Auto Scaling group. The Amazon EKS API creates this launch template either by copying one you provide or by creating one automatically with default values in your account.

yann-soubeyrand avatar Jul 13 '22 15:07 yann-soubeyrand

Thanks @yann-soubeyrand, the behavior I'm seeing makes sense then, although I'm wondering how Pulumi handles when we update the template then, do the changes also get copied, and version numbers?

johnharris85 avatar Jul 18 '22 20:07 johnharris85

@johnharris85 when you specify a launch template for your managed node group, you indicate its version. When you update the version, EKS automatically updates its copy and does a rolling replace of the nodes.

yann-soubeyrand avatar Jul 22 '22 16:07 yann-soubeyrand

Pretty sure when I tested this Pulumi was not picking up updates, but I will re-test. Thanks!

johnharris85 avatar Jul 25 '22 17:07 johnharris85

I was able to get a ManagedNodeGroup working with a custom LaunchTemplate in Python. Below is what's working for me.

It takes AWS about 15 minutes to update the node group (of 2 nodes) when I change the user data. New nodes start and join the group/cluster within about 3 minutes, but it takes longer for the pods to get rescheduled and the old nodes to terminate.

$ pulumi about
CLI
Version      3.46.1
Go Version   go1.19.2
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.7.2
eks         0.42.7
honeycomb   0.0.11
kubernetes  3.23.1
python      3.10.8
_aws_account_id = aws.get_caller_identity().account_id

_K8S_VERSION = "1.23"  # latest visible in pulumi-eks

_NODE_ROOT_VOLUME_SIZE_GIB = 60
# Script to run on EKS nodes as root before EKS bootstrapping (which starts the kubelet)
# default bootstrap: https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
# This user data must be in mime format when passed to a launch template.
# https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
#
# From MNG launch template docs:
# "your user data is merged with Amazon EKS user data required for nodes to join the
# cluster. Don't specify any commands in your user data that starts or modifies kubelet."
# Inspecting instance user data shows this and the original user data in separate MIME
# parts, both in the user data with this 1st.
_NODE_USER_DATA = r"""#!/bin/bash
set -e

eho "Doing my custom setup, kubelet will start next."
"""


_USER_DATA_MIME_HEADER = """MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
"""


_USER_DATA_MIME_FOOTER = """

--//--
"""


def _wrap_and_encode_user_data(script_text: str) -> str:
    mime_encapsulated = _USER_DATA_MIME_HEADER + script_text + _USER_DATA_MIME_FOOTER
    encoded_bytes = base64.b64encode(mime_encapsulated.encode())
    return encoded_bytes.decode("latin1")


def _define_cluster_and_get_provider() -> Tuple[eks.Cluster, k8s.Provider]:
    # https://www.pulumi.com/docs/guides/crosswalk/aws/eks/
    # https://www.pulumi.com/registry/packages/eks/api-docs/cluster/#cluster

    # Map AWS IAM users to Kubernetes internal RBAC admin group. Mapping individual
    # users avoids having to go from a group to a role with assume-role policies.
    # Kubernetes has its own permissions (RBAC) system, with predefined groups for
    # common permissions levels. AWS EKS provides translation from IAM to that, but we
    # must explicitly map particular users or roles that should be granted permissions
    # within the cluster.
    #
    # AWS docs: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
    # Detailed example: https://apperati.io/articles/managing_eks_access-bs/
    # IAM groups are not supported, only users or roles:
    #     https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/176
    user_mappings = []
    for username in TEAM_MEMBERS:
        user_mappings.append(
            eks.UserMappingArgs(
                # AWS IAM user to set permissions for
                user_arn=f"arn:aws:iam::{_aws_account_id}:user/{username}",
                # k8s RBAC group from which this IAM user will get permissions
                groups=["system:masters"],
                # k8s RBAC username to create for the user
                username=username,
            )
        )

    node_role = _define_node_role(EKS_CLUSTER_NAME)

    cluster = eks.Cluster(
        EKS_CLUSTER_NAME,
        name=EKS_CLUSTER_NAME,
        version=_K8S_VERSION,
        vpc_id=_CLUSTER_VPC,
        subnet_ids=_CLUSTER_SUBNETS,
        # OpenID Connect Provider maps from k8s to AWS IDs.
        # Get the OIDC's ID with:
        # aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.identity.oidc.issuer" --output text
        create_oidc_provider=True,
        user_mappings=user_mappings,
        skip_default_node_group=True,
        instance_role=node_role,
    )
    # Export the kubeconfig to allow kubectl to access the cluster. For example:
    #    pulumi stack output my-kubeconfig > kubeconfig.yml
    #    KUBECONFIG=./kubeconfig.yml kubectl get pods -A
    pulumi.export(f"my-kubeconfig", cluster.kubeconfig)

    # Work around cluster.provider being the wrong type for Namespace to use.
    # https://github.com/pulumi/pulumi-eks/issues/662
    provider = k8s.Provider(
        f"my-cluster-provider",
        kubeconfig=cluster.kubeconfig.apply(lambda k: json.dumps(k)),
    )

    launch_template = aws.ec2.LaunchTemplate(
        f"{EKS_CLUSTER_NAME}-launch-template",
        block_device_mappings=[
            aws.ec2.LaunchTemplateBlockDeviceMappingArgs(
                device_name="/dev/xvda",
                ebs=aws.ec2.LaunchTemplateBlockDeviceMappingEbsArgs(
                    volume_size=_NODE_ROOT_VOLUME_SIZE_GIB,
                ),
            ),
        ],
        user_data=_wrap_and_encode_user_data(
            _NODE_USER_DATA
        ),
        # The default version shows up first in the UI, so update it even though
        # we don't really need to since we use latest_version below.
        update_default_version=True,
        # Other settings, such as tags required for the node to join the group/cluster,
        # are filled in by default.
    )

    # The EC2 instances that the cluster will use to execute pods.
    # https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/
    eks.ManagedNodeGroup(
        f"{EKS_CLUSTER_NAME}-managed-node-group",
        node_group_name=f"{EKS_CLUSTER_NAME}-managed-node-group",
        cluster=cluster.core,
        version=_K8S_VERSION,
        subnet_ids=_CLUSTER_SUBNETS,
        node_role=node_role,
        instance_types=["r6i.2xlarge"],
        scaling_config=aws.eks.NodeGroupScalingConfigArgs(
            min_size=1,
            desired_size=2,
            max_size=4,
        ),
        launch_template={
            "id": launch_template.id,
            "version": launch_template.latest_version,
        },
    )

    return cluster, provider

markfickett avatar Jan 13 '23 15:01 markfickett

It'd be helpful if the docs were updated as well to define NodeGroupLaunchTemplateArgs https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/#nodegrouplaunchtemplate

sudosoul avatar Jun 23 '23 20:06 sudosoul