cluster-api-provider-aws Enhance the EKSConfigTemplate and add configuration for files, mounts, users, ntp, etc for CAPI feature parity

What type of PR is this? /kind feature

What this PR does / why we need it: Enhances the EKSConfigTemplate cloud-init to expose many more userdata features when bootstrapping EKS nodes in a EKS managed cluster. CAPI already exposes these configuration in the KubeadmConfigTemplate. Feature parity makes it easier for users coming from a self-managed clusters to switch and adopt EKS clusters.

Enhances the EKSConfigTemplate cloud-init, which currently uses a bash-style cloud-init user data to the YAML-based cloud-init data format, which has exposes more capabilities than can easily be accomplished with bash. CAPI's KubeadmConfigTemplate uses the YAML data format of cloud-init, which allows it to have additional capabilities in disk, mount, and filesystem layouts, amongst other things. This PR adds the same capabilities to CAPA and exposes them as configuration in the EKSConfigTemplate.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #3756

Special notes for your reviewer: This PR was rebased off of an existing PR, https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3341, which exposed more commands to the user, but did not switch the cloud-init away from a bash format.

Checklist:

[x] squashed commits
[x] includes documentation
[x] adds unit tests
[ ] adds or updates e2e tests

Sep 27 '22 23:09 cnmcavoy

Hi @cnmcavoy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 27 '22 23:09 k8s-ci-robot

/ok-to-test

Sep 28 '22 10:09 richardcase

/assign

Oct 07 '22 10:10 Skarlso

Just a couple little changes. This looks okay for me so far. Can you please add an e2e test for this? I can help you with how to write it. :)

I'll gladly write an e2e test for this, but I'm not familiar with how the e2e testing works here in general so I'd need some hand holding in testing locally and writing them. To not block this, I lean towards adding the e2e test in a separate ticket.

Alternatively, you can create a follow up ticket to write an e2e test, in that case, please include a manual run and it's output showing that it actually works.

What kind of output are you looking for? The controller doesn't log anything new, but here's an example EKSConfigTemplate which I successfully used with a MachineDeployment to create EKS workers:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: EKSConfigTemplate
metadata:
  annotations:
  creationTimestamp: "2022-09-28T19:09:35Z"
  generation: 1
  name: rad-worker-az1-bd8c2fb9
  namespace: capi-ekscmhdev
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: ekscmhdev
    uid: 442b7e5e-85b6-43fb-a2b8-6b6f27dd4f03
  resourceVersion: "172425"
  uid: 9daddfce-2505-4972-9bdf-7af2c745184a
spec:
  template:
    spec:
      containerRuntime: containerd
      diskSetup:
        filesystems:
        - device: /dev/sdb
          filesystem: ext4
          label: RAD
        partitions:
        - device: /dev/sdb
          layout: true
      files:
      - content: |
          vm.max_map_count=262144
        path: /etc/sysctl.d/90-vm-max-map-count.conf
      - content: |
          fs.inotify.max_user_instances=256
        path: /etc/sysctl.d/91-fs-inotify.conf
      kubeletExtraArgs:
        cloud-provider: aws
        event-qps: "10"
        eviction-hard: memory.available<500Mi,nodefs.available<10%
        kube-reserved: cpu=500m,memory=2Gi,ephemeral-storage=1Gi
        node-labels: role.node.kubernetes.io/worker=true,role.node.kubernetes.io/rad-worker=true,network.node.indeed.com/pods=flat,node.indeed.com/node-group=rad-worker-az1
        protect-kernel-defaults: "true"
        register-with-taints: role.node.kubernetes.io/rad-worker=true:NoSchedule
        system-reserved: cpu=500m,memory=1Gi,ephemeral-storage=1Gi
      mounts:
      - - LABEL=RAD
        - /var/lucy
        - ext4
        - defaults
      preBootstrapCommands:
      - 'TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds:
        21600" 2>/tmp/token)'
      - 'INSTANCE_ID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id
        2>/tmp/instance)'
      - 'REGION=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/placement/region
        2>/tmp/region)'
      - aws ec2 modify-instance-metadata-options --http-put-response-hop-limit 2 --region
        ${REGION} --instance-id ${INSTANCE_ID}
      - chown 1505:1505 /var/lucy
      - chmod 2755 /var/lucy
      - mkdir -p /var/lucy/_rad/headwater2
      - mkdir -p /var/lucy/_rad/delta2
      - mkdir -p /var/lucy/_rad/sandboxes
      - chown -R 1505:1505 /var/lucy/_rad
      - chmod 2775 /var/lucy/_rad/sandboxes
      - sudo systemctl restart systemd-sysctl

Oct 07 '22 20:10 cnmcavoy

For output, I'm looking for something like the actual set of values applied to the cluster and the node. So I would assume a describe of the Node would show the set values.

Oct 08 '22 08:10 Skarlso

For output, I'm looking for something like the actual set of values applied to the cluster and the node. So I would assume a describe of the Node would show the set values.

I dumped capi/capa resources and nodes: https://gist.github.com/cnmcavoy/52457c07ccdf07b388bf62a731c72e27

We use a helm chart internally to generate the Cluster, MachineDeployments, and other resources. I can include the output of this chart too if desired.

Oct 10 '22 20:10 cnmcavoy

Thanks, @cnmcavoy. I'll take a good look at that output. :)

Oct 11 '22 17:10 Skarlso

@cnmcavoy So looking at this output, I have no idea what values you set. :)) Can you also please share your config file?

Oct 18 '22 07:10 Skarlso

@cnmcavoy So looking at this output, I have no idea what values you set. :)) Can you also please share your config file?

Not sure what config file you are referencing. We use a custom helm chart to generate CAPI resources for each cluster, since we started using CAPI in the 0.3 days. I created a gist that also shows the output of the chart, which are effectively the inputs to CAPI: https://gist.github.com/cnmcavoy/b93420d14fc38cfe6bb080fcaac92032

Oct 24 '22 17:10 cnmcavoy

Okay so I might be misunderstanding something. But I thought that with this pr you are enhancing the ability to configure bootstrapping nodes through making the bootstrap script configurable. Adding pre and post bootstrap scripts as well. What I'm looking for is using this ability to set something to see that it works.

Oct 25 '22 04:10 Skarlso

/lgtm With one remark.

@richardcase If you still want to take do a read on this. :)

Oct 26 '22 06:10 Skarlso

/lgtm /approve

Oct 27 '22 04:10 Skarlso

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Skarlso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Skarlso]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Oct 27 '22 04:10 k8s-ci-robot

/hold

Oct 27 '22 04:10 Skarlso

/test ?

Oct 27 '22 04:10 Skarlso

@Skarlso: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 27 '22 04:10 k8s-ci-robot

/test pull-cluster-api-provider-aws-e2e

Oct 27 '22 04:10 Skarlso

/unhold

Oct 27 '22 08:10 Skarlso

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Enhance the EKSConfigTemplate and add configuration for files, mounts, users, ntp, etc for CAPI feature parity

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard