cluster-api-provider-aws Bottlerocket worker node support

trafficstars

/kind feature

Describe the solution you'd like As a user, I would like to be able spawn bottlerocket worker nodes in CAPA-owned clusters. I can manually specify the AMI ID of Bottlerocket from the AMI catalog, but this is insufficient to have the workers join a cluster, the workers are created but never "join" and add the node to the cluster, so pod workloads can not schedule. This affects both EKS and non-EKS clusters.

Anything else you would like to add: What seems to be missing is a way to configure the Bottlerocket user data. Bottlerocket is immutable at runtime, so configuration must be done before the node is created. Bottlerocket takes a TOML-formatted configuration format that must be supplied as user-data, which is where configuration such as the api-server endpoint, certificate data, cluster name, etc are.

CAPA seems to be unaware Bottlerocket has a custom configuration format and needs to be enhanced to be aware of it, in order to supply the correct settings for a node to join a cluster.

https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2009 seems to be related, but only describes simplifying the AMI selection. However, manually specifying the AMI ID of Bottlerocket directly is not sufficient to join a cluster at present.

Dec 20 '23 21:12 cnmcavoy

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 20 '23 21:12 k8s-ci-robot

@cnmcavoy I was able to provision bottlerocket nodes and join it to the cluster by passing a custom bootstrap secret with userdata information to the MachinePool object but I see disk pressure issues with BottleRocket nodes as the /dev/xvdb comes with 20GB of data storage (used for containers, etc) which I couldn't figure out a way to expand using AWSManagedMachinePool -> LaunchTemplate and because of this some of the pods are ending up in Eviction, ContainerStatusUnknown states.

Jan 23 '24 02:01 praveenadini

It would be nice if the bottlerocket could be quickly specified by amiType, like AL2.

Apr 08 '24 00:04 kahirokunn

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 07 '24 01:07 k8s-triage-robot

Keep

Jul 07 '24 01:07 kahirokunn

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 06 '24 01:08 k8s-triage-robot

keep

Aug 06 '24 05:08 kahirokunn

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Sep 05 '24 05:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sep 05 '24 05:09 k8s-ci-robot

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Bottlerocket worker node support

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard