pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

Provide a way to run `user_data` before EKS bootstrapping

Open hiradp opened this issue 3 years ago • 1 comments

Hello!

  • Vote on this issue by adding a 👍 reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

Hey everyone 👋 .

First off, thank you for taking the time to read this, and my apologies if I am not filling this out correctly. Truth is, I don't even know if this is a bug, a limitation, or a flat-out feature request. Or perhaps I am missing something (most likely). I will explain our use case and what we're trying to achieve with some links to Stack overflow and other GitHub issues.

We want to configure our pods with an emptyDir volume as described here using m5dn.2xlarge instances for the nodes. But as described here:

  • https://stackoverflow.com/questions/66828369/missing-nvme-ssd-in-aws-kubernetes
  • https://github.com/awslabs/amazon-eks-ami/issues/349

This is actually kinda hard. We need to format and mount the SSDs to the instance prior to kubelet starting. We were able to do this via a very minimal EC2 Launch Template that pretty much does not specify anything except the following user data:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
#!/bin/bash -xe
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
  echo "Applying custom user-data"

  echo "Making directories..."
  mkdir -p /ssd-volume /var/lib/kubelet /var/lib/docker

  echo "Making fs..."
  mkfs.xfs /dev/nvme1n1

  echo "Mounting..."
  mount /dev/nvme1n1 /ssd-volume

  echo "Modifying permissions..."
  chmod 0755 /ssd-volume

  echo "Moving kubelet directory..."
  mv /var/lib/kubelet /ssd-volume

  echo "Moving docker directory..."
  mv /var/lib/docker /ssd-volume

  echo "Linking kubelet..."
  ln -sf /ssd-volume/kubelet /var/lib/kubelet

  echo "Linkning docker..."
  ln -sf /ssd-volume/docker /var/lib/docker

  echo "Done!"
--//--

It is important to not specify the AMI, as stated https://github.com/awslabs/amazon-eks-ami/issues/719#issuecomment-896127079 here:

When no AMI is present in the launch template (as is the case for you, if I'm reading your gist correctly), EKS will merge in a section of MIME multi-part user data to the user data contents you've passed in. The part EKS merges in will attempt to bootstrap your worker node as well. Since MIME multiparts are executed in order, this means your bootstrapping happens first and the EKS bootstrapping becomes a no-op.

This part is pretty crucial. Because the above script needs to execute before kubelet starts.

To automate this with Pulumi, I guess you would be using the node_user_data, as documented here. But based on this snippet https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/nodegroup.ts#L478-L495 and the documentation it seems like the provided value gets executed after kubelet starts.

I guess the next natural thing that I would try is node_user_data_override - but I am not sure how that would be given that I need some specific cluster information to make this script up.

Please note that this is not blocking. We got around this by creating an EC2 Launch Template and telling the node groups to use that.

Thank you for reading this lengthy issue and thank you again in advance for any thoughts or input.

hiradp avatar Feb 02 '22 19:02 hiradp

Thanks, yes this is an enhancement to existing behaviour so will categorise as such.

danielrbradley avatar Feb 07 '22 14:02 danielrbradley