pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

Clarify documentation on Cluster.node_group_options about which node groups are affected (only the default node group?)

Open markfickett opened this issue 2 years ago • 2 comments

What happened?

I'd like to set a few properties on my EKS nodes in a ManagedNodeGroup:

  • node_user_data
  • node_root_volume_size
  • kubelet_extra_args

I am not using the default node group (Cluster.skip_default_node_group=True) and I'm using a separate ManagedNodeGroup, associating it with the cluster by passing ManagedNodeGroup(cluster=cluster.core).

For one example: node_root_volume_size. When I pass that in my node_group_options, setting it to 40, it doesn't change my nodes' root volume size (seen as the setting in the AWS console in the "Node group configuration" > "Disk size", or as the reported available disk space from my Honeycomb agent reporting k8s metrics), which remains at the default 20.

I'm also seeing similar behavior with node_user_data, I don't think it's getting set though it's not quite as easy to verify.

I'm using a slightly older version of Pulumi / pulumi-eks because some of my dependencies are stuck on protobuf 3 and I haven't been able to update everything to be compatible with protobuf 4. And because of that, in the main Cluster args, I'm setting node_root_volume_size=False because of #643.

Steps to reproduce

_NODE_ROOT_VOLUME_SIZE_GIB = 40
# Script to run on EKS nodes after EKS bootstrapping (which starts Docker etc)
# but before beginning k8s work.
# Make a 20GB swap file.
# https://stackoverflow.com/questions/17173972/how-do-you-add-swap-to-an-ec2-instance
_NODE_USER_DATA_ADD_SWAP = """#!/bin/bash
set -e
dd if=/dev/zero of=/swapfile bs=1M count=20000
mkswap /swapfile
swapon /swapfile
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab
"""

    cluster = eks.Cluster(
        EKS_CLUSTER_NAME,
        name=EKS_CLUSTER_NAME,
        vpc_id=_CLUSTER_VPC,
        subnet_ids=_CLUSTER_SUBNETS,
        # OpenID Connect Provider maps from k8s to AWS IDs.
        "cluster.identity.oidc.issuer" --output text
        create_oidc_provider=True,
        user_mappings=user_mappings,
        skip_default_node_group=True,
        instance_role=node_role,
        # Set this arg to False here so we can set it for real in the node_group_options
        # below, to work around https://github.com/pulumi/pulumi-eks/issues/643#issuecomment-1038768339
        node_root_volume_size=False,
        # https://www.pulumi.com/registry/packages/eks/api-docs/cluster/#clusternodegroupoptions
        node_group_options=eks.ClusterNodeGroupOptionsArgs(
            # Configure startup script and root volume size to allow for swap.
            node_user_data=_NODE_USER_DATA_ADD_SWAP,
            node_root_volume_size=_NODE_ROOT_VOLUME_SIZE_GIB,
        ),
    )
    # The EC2 instances that the cluster will use to execute pods.
    # https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/
    eks.ManagedNodeGroup(
        f"{EKS_CLUSTER_NAME}-managed-node-group",
        node_group_name=f"{EKS_CLUSTER_NAME}-managed-node-group",
        cluster=cluster.core,
        subnet_ids=_CLUSTER_SUBNETS,
        node_role=node_role,
        instance_types=["r6i.2xlarge"],
        scaling_config=aws.eks.NodeGroupScalingConfigArgs(
            min_size=2,
            desired_size=2,
            max_size=2,
        ),
    )

Expected Behavior

I would expect the managed node group to show 40GiB for Disk size in the AWS console after a pulumi up or, if not that, then after destroying and re-creating the managed node group, or if not that then after destroying and re-creating the full cluster.

Actual Behavior

No change to displayed/reported disk size.

Output of pulumi about

CLI
Version      3.46.1
Go Version   go1.19.2
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.7.2
eks         0.42.7
honeycomb   0.0.11
kubernetes  3.23.1
python      3.10.8

Host
OS       ubuntu
Version  20.04
Arch     x86_64

Additional context

Some alternatives:

  • Could I just use the default node group? Maybe the Cluster.node_group_options only affect that. But when I try to use it, the cluster doesn't show any compute resources, and my daemonsets don't schedule on the nodes.
  • Do I need a launch template for my managed node group, instead of using these args at all? I'm having trouble figuring out which launch template args I need, and also https://github.com/pulumi/pulumi-eks/issues/633 implies it may not work (or may need a lot of extra legwork).

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

markfickett avatar Jan 12 '23 14:01 markfickett

Turns out I do need a launch template. Adding notes on #633 for what works for me.

In this case, it would be helpful to add docs on Cluster and node_group_options about (a) which options within those two are mutually exclusive with each other, and (b) which kinds of node groups are affected by the node group options -- are they only for the default node group?

markfickett avatar Jan 13 '23 15:01 markfickett

Hi @markfickett - thank you for filing this issue and for following up as well.

I will forward this docs request to the team in charge.

guineveresaenger avatar Jan 14 '23 00:01 guineveresaenger