cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Enhance the EKSConfigTemplate and add configuration for files, mounts, users, ntp, etc for CAPI feature parity
What type of PR is this? /kind feature
What this PR does / why we need it: Enhances the EKSConfigTemplate cloud-init to expose many more userdata features when bootstrapping EKS nodes in a EKS managed cluster. CAPI already exposes these configuration in the KubeadmConfigTemplate. Feature parity makes it easier for users coming from a self-managed clusters to switch and adopt EKS clusters.
Enhances the EKSConfigTemplate cloud-init, which currently uses a bash-style cloud-init user data to the YAML-based cloud-init data format, which has exposes more capabilities than can easily be accomplished with bash. CAPI's KubeadmConfigTemplate uses the YAML data format of cloud-init, which allows it to have additional capabilities in disk, mount, and filesystem layouts, amongst other things. This PR adds the same capabilities to CAPA and exposes them as configuration in the EKSConfigTemplate.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):
Fixes #3756
Special notes for your reviewer: This PR was rebased off of an existing PR, https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3341, which exposed more commands to the user, but did not switch the cloud-init away from a bash format.
Checklist:
- [x] squashed commits
- [x] includes documentation
- [x] adds unit tests
- [ ] adds or updates e2e tests
Hi @cnmcavoy. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
/assign
Just a couple little changes. This looks okay for me so far. Can you please add an e2e test for this? I can help you with how to write it. :)
I'll gladly write an e2e test for this, but I'm not familiar with how the e2e testing works here in general so I'd need some hand holding in testing locally and writing them. To not block this, I lean towards adding the e2e test in a separate ticket.
Alternatively, you can create a follow up ticket to write an e2e test, in that case, please include a manual run and it's output showing that it actually works.
What kind of output are you looking for? The controller doesn't log anything new, but here's an example EKSConfigTemplate which I successfully used with a MachineDeployment to create EKS workers:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: EKSConfigTemplate
metadata:
annotations:
creationTimestamp: "2022-09-28T19:09:35Z"
generation: 1
name: rad-worker-az1-bd8c2fb9
namespace: capi-ekscmhdev
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
name: ekscmhdev
uid: 442b7e5e-85b6-43fb-a2b8-6b6f27dd4f03
resourceVersion: "172425"
uid: 9daddfce-2505-4972-9bdf-7af2c745184a
spec:
template:
spec:
containerRuntime: containerd
diskSetup:
filesystems:
- device: /dev/sdb
filesystem: ext4
label: RAD
partitions:
- device: /dev/sdb
layout: true
files:
- content: |
vm.max_map_count=262144
path: /etc/sysctl.d/90-vm-max-map-count.conf
- content: |
fs.inotify.max_user_instances=256
path: /etc/sysctl.d/91-fs-inotify.conf
kubeletExtraArgs:
cloud-provider: aws
event-qps: "10"
eviction-hard: memory.available<500Mi,nodefs.available<10%
kube-reserved: cpu=500m,memory=2Gi,ephemeral-storage=1Gi
node-labels: role.node.kubernetes.io/worker=true,role.node.kubernetes.io/rad-worker=true,network.node.indeed.com/pods=flat,node.indeed.com/node-group=rad-worker-az1
protect-kernel-defaults: "true"
register-with-taints: role.node.kubernetes.io/rad-worker=true:NoSchedule
system-reserved: cpu=500m,memory=1Gi,ephemeral-storage=1Gi
mounts:
- - LABEL=RAD
- /var/lucy
- ext4
- defaults
preBootstrapCommands:
- 'TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds:
21600" 2>/tmp/token)'
- 'INSTANCE_ID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id
2>/tmp/instance)'
- 'REGION=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/placement/region
2>/tmp/region)'
- aws ec2 modify-instance-metadata-options --http-put-response-hop-limit 2 --region
${REGION} --instance-id ${INSTANCE_ID}
- chown 1505:1505 /var/lucy
- chmod 2755 /var/lucy
- mkdir -p /var/lucy/_rad/headwater2
- mkdir -p /var/lucy/_rad/delta2
- mkdir -p /var/lucy/_rad/sandboxes
- chown -R 1505:1505 /var/lucy/_rad
- chmod 2775 /var/lucy/_rad/sandboxes
- sudo systemctl restart systemd-sysctl
For output, I'm looking for something like the actual set of values applied to the cluster and the node. So I would assume a describe of the Node would show the set values.
For output, I'm looking for something like the actual set of values applied to the cluster and the node. So I would assume a describe of the Node would show the set values.
I dumped capi/capa resources and nodes: https://gist.github.com/cnmcavoy/52457c07ccdf07b388bf62a731c72e27
We use a helm chart internally to generate the Cluster, MachineDeployments, and other resources. I can include the output of this chart too if desired.
Thanks, @cnmcavoy. I'll take a good look at that output. :)
@cnmcavoy So looking at this output, I have no idea what values you set. :)) Can you also please share your config file?
@cnmcavoy So looking at this output, I have no idea what values you set. :)) Can you also please share your config file?
Not sure what config file you are referencing. We use a custom helm chart to generate CAPI resources for each cluster, since we started using CAPI in the 0.3 days. I created a gist that also shows the output of the chart, which are effectively the inputs to CAPI: https://gist.github.com/cnmcavoy/b93420d14fc38cfe6bb080fcaac92032
Okay so I might be misunderstanding something. But I thought that with this pr you are enhancing the ability to configure bootstrapping nodes through making the bootstrap script configurable. Adding pre and post bootstrap scripts as well. What I'm looking for is using this ability to set something to see that it works.
/lgtm With one remark.
@richardcase If you still want to take do a read on this. :)
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Skarlso
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Skarlso]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
/hold
/test ?
@Skarlso: The following commands are available to trigger required jobs:
-
/test pull-cluster-api-provider-aws-build
-
/test pull-cluster-api-provider-aws-test
-
/test pull-cluster-api-provider-aws-verify
The following commands are available to trigger optional jobs:
-
/test pull-cluster-api-provider-aws-apidiff-main
-
/test pull-cluster-api-provider-aws-e2e
-
/test pull-cluster-api-provider-aws-e2e-blocking
-
/test pull-cluster-api-provider-aws-e2e-clusterclass
-
/test pull-cluster-api-provider-aws-e2e-conformance
-
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
-
/test pull-cluster-api-provider-aws-e2e-eks
Use /test all
to run the following jobs that were automatically triggered:
-
pull-cluster-api-provider-aws-apidiff-main
-
pull-cluster-api-provider-aws-build
-
pull-cluster-api-provider-aws-test
-
pull-cluster-api-provider-aws-verify
In response to this:
/test ?
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/test pull-cluster-api-provider-aws-e2e
/unhold