pulumi-eks
pulumi-eks copied to clipboard
Custom launch template is not used when creating a new Managed Node Group
Hello!
- Vote on this issue by adding a 👍 reaction
- To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)
Issue details
When creating a new Managed Node Group, I specified a custom (ec2) launch template via launchTemplate.
Though, newly launched EC2 instances do not appear to be using this launch template since the EC2 instance tag name aws:ec2launchtemplate:id refers to the one created by this provider instead.
Steps to reproduce
- Use https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups as a starting point
- Create a new launch template as part of your code
const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
tags: {testTag: "tag value"},
});
- Set the launch template for the managed node group like this
...
launchTemplate: {
id: launchTemplate.id,
version: '$Latest'
}
- Deploy your changes
Expected: The custom launch template is used to launch new EC2 instances. Actual: The default launch template created by this provider is used.
id is an Output, so you need to interpolate it:
id: pulumi.interpolate`${launchTemplate.id}`
More context:
https://www.pulumi.com/registry/packages/aws/api-docs/ec2/launchtemplate/#outputs https://www.pulumi.com/docs/intro/concepts/inputs-outputs/#outputs-and-strings
It is not recommended to use '$Latest' for the launch template version because the AWS API will this as 1 and parse it as drift every time, causing Pulumi to delete-replace it.
const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", { tags: {testTag: "tag value"}, });
This tags the launch template, but does not tag the instances created by the launch template. To tag the instances created by the launch template, you can do:
const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
tagSpecifications: [
{ resourceType: "instance", tags: { testTag: "tag value" } },
],
});
Reopening this with a question from the community Slack:
Attaching a custom LaunchTemplate to an EKS ManagedNodeGroup doesn't seem to work? For example, following this: https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups. I create a new LaunchTemplate with some metadata options and a key pair, refer to it in the eks.createManagedNodeGroup() args:
launchTemplate: { id: pulumi.interpolate`${myLaunchTemplate.id}`, version: "1", },When the node group comes up, it says on the EKS page that it's using mine, but on the instances themselves in the ASG, it's using an auto-created one. Is this a bug? Or am I missing something fundamental?
Same issue:
const localCluster = new eks.Cluster(`localCluster`, {
name: `localCluster`,
version: "1.21",
vpcId: vpc.id,
publicSubnetIds: vpc.publicSubnetIds,
privateSubnetIds: vpc.privateSubnetIds,
nodeAssociatePublicIpAddress: false,
endpointPrivateAccess: true,
endpointPublicAccess: true,
createOidcProvider: true,
clusterSecurityGroup: apiSg,
skipDefaultNodeGroup: true,
providerCredentialOpts: {
profileName: aws.config.profile,
},
},);
const localEKSLaunchTemplate = new aws.ec2.LaunchTemplate(`localEKSLaunchTemplate`, {
metadataOptions: {
httpEndpoint: "enabled",
httpTokens: "required",
httpPutResponseHopLimit: 2,
},
keyName: keyName,
defaultVersion: 1,
})
const localClusterMNG = new eks.ManagedNodeGroup(`localClusterMNG`, {
version: "1.21",
cluster: localCluster,
nodeRole: localCluster.core.instanceRoles[0],
subnetIds: vpc.privateSubnetIds,
scalingConfig: {
minSize: 1,
desiredSize: 2,
maxSize: 25,
},
launchTemplate: {
id: localEKSLaunchTemplate.id,
version: pulumi.interpolate`${localEKSLaunchTemplate.latestVersion}`,
},
}, {ignoreChanges: ["scalingConfig"]})
The launch template is created, and on the EKS dashboard it says it's being used for the node group, however when looking at the actual EC2 instances / ASG that are part of the node group, they all show the default EKS launch template.
Stumbled upon this issue. I have been developing using python and the following function works for me. Hopefully this helps people resolve their issues. Difference here being I associate an EKS AMI and the SG created by the cluster.
def create_launch_template(stack, cluster, node_group, k8s_version):
ami_id = fetch_latest_ami_id(k8s_version)
launch_template_name = f"{stack}-{node_group.get('name')}-lt"
eks_sg = cluster.core.cluster.vpc_config.cluster_security_group_id
complete_user_data = (
user_data.SCRIPT_FORMAT
+ node_group.get("bootstrap_commands")
+ user_data.SCRIPT_BOUNDARY_END
+ user_data.BASE_USER_DATA
)
launch_template_device_mapping_args = LaunchTemplateBlockDeviceMappingArgs(
device_name="/dev/xvda",
ebs=LaunchTemplateBlockDeviceMappingEbsArgs(
volume_size=100,
),
)
tag_pairs = {
"eks_cluster": cluster.eks_cluster.name,
"launch_template_name": launch_template_name,
"node_group": node_group.get("name"),
"Stack": stack,
}
logger.info(f"iam#create_launch_template Creating Launch Template {launch_template_name}")
launch_template = LaunchTemplate(
launch_template_name,
name=launch_template_name,
block_device_mappings=[launch_template_device_mapping_args],
user_data=format_user_data(cluster, complete_user_data),
image_id=ami_id,
vpc_security_group_ids=[eks_sg],
tags=tag_pairs,
tag_specifications=[
LaunchTemplateTagSpecificationArgs(
resource_type="instance",
tags=tag_pairs,
)
],
)
return launch_template
Thanks @sushantkumar-amagi:
- Is this a managed node group? (curious if that makes a difference)
- Where are you actually specifying the node group should use the launch template? (I don't see that above)
- The node group / EC2 instances that are created are definitely using this launch template? (mine shows that it's using it, but when looking at the instances it actually isn't)
I see that you're using tags, and @lukehoban talks about tags in his message above, but are tags a necessary piece of getting this to work? I can't think why that would be the case, and can't find anything in the AWS docs about that? (although certainly happy to be wrong :smile: )
HI @johnharris85
- This is a managed node group. Created using pulumi_eks package itself.
- This is how I am creating the node groups
def create_node_group(node_group, stack, cluster, ec2_role, k8s_version):
launch_template = launch_templates.create_launch_template(
stack, cluster, node_group, k8s_version
)
# If $Default or $Latest is used as a version, then every time the stack is updated
# it shows a diff and deletes-replaces the nodegroup
launch_template_args = NodeGroupLaunchTemplateArgs(
version=launch_template.latest_version, id=launch_template.id
)
taints = []
for taint in node_group.get("taints", []):
taint = NodeGroupTaintArgs(
effect=taint.get("effect"), key=taint.get("key"), value=taint.get("value")
)
taints.append(taint)
nodegroup_scaling_args = NodeGroupScalingConfigArgs(
desired_size=node_group.get("capacities").get("desired"),
max_size=node_group.get("capacities").get("max"),
min_size=node_group.get("capacities").get("min"),
)
nodegroup_azs = get_subnet_ids(node_group.get("az", []))
tag_args = {"Name": node_group.get("name"), "Stack": stack}
tag_args.update(node_group.get("tags"))
logger.info(f"iam#create_node_group Creating EKS NodeGroup {node_group.get('name')}")
eks_node_group = ManagedNodeGroup(
node_group.get("name"),
node_group_name=node_group.get("name"),
subnet_ids=nodegroup_azs,
cluster=cluster.core,
capacity_type=node_group.get("capacities").get("type"),
taints=taints,
instance_types=node_group.get("instance_types"),
node_role_arn=ec2_role.arn,
scaling_config=nodegroup_scaling_args,
launch_template=launch_template_args,
tags=tag_args
)
return eks_node_group
Reason for this function to exist is because I am going to be looping over a number of node group configurations.
- I was able to check if the launch template was applied by logging into the instance and curling the userdata i.e
curl http://169.254.169.254/latest/user-data. I havent handed it over to the Dev team yet so not a 100% confirmation but the tags and userdata script seem to be in line with what I am expecting.
Also I dont think tags are absolutely essential for this to work, infact I had forgotten to tag them before coming across this issue.
Thanks for the response @sushantkumar-amagi. OK so I've done some more testing with this, and it's actually pretty weird (or maybe I'm misunderstanding how EKS does Node Groups / custom launch templates?)
I create an EKS cluster with 2 MNGs. One specifies a launch template, the other doesn't. Both of the MNGs get created. In the EKS console I can see that MNG 1 is using my custom LT, MNG 2 has no LT. So far so good.
Now when I visit the Autoscaling groups for the nodes for each MNG, the ASG for MNG 1 has a launchtemplate that is created by Pulumi/EKS, not my custom one. However, the auto-created / attached LT does have the configuration from my custom LT (SSH key, other settings, etc...). Maybe it's copied over during creation? So the whole process is obviously aware of my LT. This is fine for a one shot, but if I ever want to go and update the LT in pulumi and apply it then it will have no effect as the ASGs are using the auto-created LT with the configuration from my original run of the custom LT creation.
I wonder if others are actually hitting this issue and they're just not noticing because the config works as expected (copied over) and they never update their original LT so don't notice changes aren't being propagated?
Hello,
EKS copies the launch template one gives to him (it seems to add some default settings): https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
Managed node groups are always deployed with a launch template to be used with the Amazon EC2 Auto Scaling group. The Amazon EKS API creates this launch template either by copying one you provide or by creating one automatically with default values in your account.
Thanks @yann-soubeyrand, the behavior I'm seeing makes sense then, although I'm wondering how Pulumi handles when we update the template then, do the changes also get copied, and version numbers?
@johnharris85 when you specify a launch template for your managed node group, you indicate its version. When you update the version, EKS automatically updates its copy and does a rolling replace of the nodes.
Pretty sure when I tested this Pulumi was not picking up updates, but I will re-test. Thanks!
I was able to get a ManagedNodeGroup working with a custom LaunchTemplate in Python. Below is what's working for me.
It takes AWS about 15 minutes to update the node group (of 2 nodes) when I change the user data. New nodes start and join the group/cluster within about 3 minutes, but it takes longer for the pods to get rescheduled and the old nodes to terminate.
$ pulumi about
CLI
Version 3.46.1
Go Version go1.19.2
Go Compiler gc
Plugins
NAME VERSION
aws 5.7.2
eks 0.42.7
honeycomb 0.0.11
kubernetes 3.23.1
python 3.10.8
_aws_account_id = aws.get_caller_identity().account_id
_K8S_VERSION = "1.23" # latest visible in pulumi-eks
_NODE_ROOT_VOLUME_SIZE_GIB = 60
# Script to run on EKS nodes as root before EKS bootstrapping (which starts the kubelet)
# default bootstrap: https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
# This user data must be in mime format when passed to a launch template.
# https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
#
# From MNG launch template docs:
# "your user data is merged with Amazon EKS user data required for nodes to join the
# cluster. Don't specify any commands in your user data that starts or modifies kubelet."
# Inspecting instance user data shows this and the original user data in separate MIME
# parts, both in the user data with this 1st.
_NODE_USER_DATA = r"""#!/bin/bash
set -e
eho "Doing my custom setup, kubelet will start next."
"""
_USER_DATA_MIME_HEADER = """MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
"""
_USER_DATA_MIME_FOOTER = """
--//--
"""
def _wrap_and_encode_user_data(script_text: str) -> str:
mime_encapsulated = _USER_DATA_MIME_HEADER + script_text + _USER_DATA_MIME_FOOTER
encoded_bytes = base64.b64encode(mime_encapsulated.encode())
return encoded_bytes.decode("latin1")
def _define_cluster_and_get_provider() -> Tuple[eks.Cluster, k8s.Provider]:
# https://www.pulumi.com/docs/guides/crosswalk/aws/eks/
# https://www.pulumi.com/registry/packages/eks/api-docs/cluster/#cluster
# Map AWS IAM users to Kubernetes internal RBAC admin group. Mapping individual
# users avoids having to go from a group to a role with assume-role policies.
# Kubernetes has its own permissions (RBAC) system, with predefined groups for
# common permissions levels. AWS EKS provides translation from IAM to that, but we
# must explicitly map particular users or roles that should be granted permissions
# within the cluster.
#
# AWS docs: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
# Detailed example: https://apperati.io/articles/managing_eks_access-bs/
# IAM groups are not supported, only users or roles:
# https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/176
user_mappings = []
for username in TEAM_MEMBERS:
user_mappings.append(
eks.UserMappingArgs(
# AWS IAM user to set permissions for
user_arn=f"arn:aws:iam::{_aws_account_id}:user/{username}",
# k8s RBAC group from which this IAM user will get permissions
groups=["system:masters"],
# k8s RBAC username to create for the user
username=username,
)
)
node_role = _define_node_role(EKS_CLUSTER_NAME)
cluster = eks.Cluster(
EKS_CLUSTER_NAME,
name=EKS_CLUSTER_NAME,
version=_K8S_VERSION,
vpc_id=_CLUSTER_VPC,
subnet_ids=_CLUSTER_SUBNETS,
# OpenID Connect Provider maps from k8s to AWS IDs.
# Get the OIDC's ID with:
# aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.identity.oidc.issuer" --output text
create_oidc_provider=True,
user_mappings=user_mappings,
skip_default_node_group=True,
instance_role=node_role,
)
# Export the kubeconfig to allow kubectl to access the cluster. For example:
# pulumi stack output my-kubeconfig > kubeconfig.yml
# KUBECONFIG=./kubeconfig.yml kubectl get pods -A
pulumi.export(f"my-kubeconfig", cluster.kubeconfig)
# Work around cluster.provider being the wrong type for Namespace to use.
# https://github.com/pulumi/pulumi-eks/issues/662
provider = k8s.Provider(
f"my-cluster-provider",
kubeconfig=cluster.kubeconfig.apply(lambda k: json.dumps(k)),
)
launch_template = aws.ec2.LaunchTemplate(
f"{EKS_CLUSTER_NAME}-launch-template",
block_device_mappings=[
aws.ec2.LaunchTemplateBlockDeviceMappingArgs(
device_name="/dev/xvda",
ebs=aws.ec2.LaunchTemplateBlockDeviceMappingEbsArgs(
volume_size=_NODE_ROOT_VOLUME_SIZE_GIB,
),
),
],
user_data=_wrap_and_encode_user_data(
_NODE_USER_DATA
),
# The default version shows up first in the UI, so update it even though
# we don't really need to since we use latest_version below.
update_default_version=True,
# Other settings, such as tags required for the node to join the group/cluster,
# are filled in by default.
)
# The EC2 instances that the cluster will use to execute pods.
# https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/
eks.ManagedNodeGroup(
f"{EKS_CLUSTER_NAME}-managed-node-group",
node_group_name=f"{EKS_CLUSTER_NAME}-managed-node-group",
cluster=cluster.core,
version=_K8S_VERSION,
subnet_ids=_CLUSTER_SUBNETS,
node_role=node_role,
instance_types=["r6i.2xlarge"],
scaling_config=aws.eks.NodeGroupScalingConfigArgs(
min_size=1,
desired_size=2,
max_size=4,
),
launch_template={
"id": launch_template.id,
"version": launch_template.latest_version,
},
)
return cluster, provider
It'd be helpful if the docs were updated as well to define NodeGroupLaunchTemplateArgs
https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/#nodegrouplaunchtemplate