cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Improve EKS AMI Lookups

Open luthermonson opened this issue 3 years ago • 3 comments
trafficstars

/kind feature

Describe the solution you'd like I would like to propose changing the AMI lookups a bit to standardize EKS and create some common terms.

Problem: with these two pull requests (one, two) there was support added for pulling from SSM the latest AMI ID for EKS optimized images. TL;DR on the SSM method is that AWS keeps these public key values up to date when they build new images and it’s a guaranteed place to find the most up to date AMI ID. Problems with this are:

  • Not all flavors of image are available in SSM for example all Ubuntu images don’t have a key
  • Doesn’t allow for wildcards, keys have to be exact
  • This SSM method was made to “eliminate the need for you to manually look up the Amazon EKS optimized AMI IDs”
  • Introduced new AMI “LookupTypes” which don’t really match any CAPA image naming standards.

Solution: The existing AMI search is perfectly adequate to find AMI IDs, this method takes some simple lookup params (template, baseos, kubeversion) and performs an AMI search and then loops over them and finds the most recent one and returns that AMI ID. This method allows for wildcards and more levers to pull in the query params than the EKS keys. So I propose the following:

  • Eliminate the “LookupType” linux flavor concept and move the SSM lookup method to a string where they can pass the exact SSM key they want. This makes code more maintainable (no sprintf templates) and lets the user set their key at pool creation time to exactly what they want. The logic for AMI lookup will be…
    • If AMI.ID set, pull that
    • If SSM Key set, pull that
    • Default to AMI Query with any lookup params passed or their defaults.
  • Rewrite the SSM Keys into DescribeImages input queries.
  • Add ARM64 support, this PR attempts this but I feel the better method than asking the instance type it’s arch is to add an ImageLookupArch which defaults to amd64.
  • Standardize EKS to the CAPA BaseOS terms… change AmazonLinux to amazon-2, AmazonLinuxGPU is amazon-2-gpu.

Pros: Enables ubuntu-18.04 and ubuntu-20.04 support for EKS. Enables ARM64 support across the entire project as currently x86_64 is hardcoded in all DescribeImages queries. Standardizes the support BaseOS and gives them names and constants to validate against.

Cons: Deprecates some APIs that may be in use, however… the one I'm looking at is AMIReference.EKSOptimizedLookupType and it’s not even documented and in experimental EKS features.

Sample list of BaseOS across Self managed/EKS from my current WIP

Amazon2         = "amazon-2"          // capa, eks
Amazon2GPU      = "amazon-2-gpu"      // eks
Ubuntu1804      = "ubuntu-18.04"      // capa, eks
Ubuntu2004      = "ubuntu-20.04"      // capa, eks
CentOS7         = "centos-7"          // capa
FlatcarStable   = "flatcar-stable"    // capa
BottleRocket    = "bottlerocket"      // eks
Windows2019Core = "windows-2019-core" // eks
Windows2019Full = "windows-2019-full" // eks
Windows2022Core = "windows-2022-core" // eks (coming soon)
Windows2022Full = "windows-2022-full" // eks (coming soon)

and the Name templates for EKS from my WIP look like this.

// go template for AMI Name Lookup
var eksAMINameFormats = map[string]string{
	Amazon2:         "amazon-eks-node-{{.K8sVersion}}-v*", //amazon-2
	Amazon2GPU:      "",
	Ubuntu1804:      "ubuntu-eks/k8s_{{.K8sVersion}}/images/*18.04*",
	Ubuntu2004:      "ubuntu-eks/k8s_{{.K8sVersion}}/images/*20.04*",
	BottleRocket:    "bottlerocket-aws-k8s-{{.K8sVersion}}*  ",
	Windows2019Core: "Windows_Server-2019-English-Core-EKS_Optimized-{{.K8sVersion}}-*", // coming soon
	Windows2019Full: "Windows_Server-2019-English-Full-EKS_Optimized-{{.K8sVersion}}-*", // coming soon
	Windows2022Core: "Windows_Server-2022-English-Core-EKS_Optimized-{{.K8sVersion}}-*", // currently unavailable
	Windows2022Full: "Windows_Server-2022-English-Full-EKS_Optimized-{{.K8sVersion}}-*", // currently unavailable
	CentOS7:         "",                                                                 // unavailable, leave empty so ok check fails and return default, add pattern if they become available
	FlatcarStable:   "",                                                                 // unavailable, leave empty so ok check fails and return default, add pattern if they become available
}

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

luthermonson avatar Aug 11 '22 18:08 luthermonson

Related Issues

  • Helps with windows support for eks: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3533
  • Helps with ARM64 support for the entire project: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3121

luthermonson avatar Aug 11 '22 18:08 luthermonson

The proposal looks reasonable to me. But I'm not great in the AMI code lookup part atm. I will take a look at the code review accordingly.

That said, I like this approach. It adds more flexibility to how we would use AMI lookup and add support for other types much easier if I understand it correctly.

Skarlso avatar Aug 16 '22 07:08 Skarlso

/triage accepted /priority important-soon /milestone v1.6.0 /area provider/eks

richardcase avatar Sep 02 '22 10:09 richardcase

Heh, turns out there is an ancient issue about something like this already here https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/1869

Skarlso avatar Sep 08 '22 07:09 Skarlso

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 08 '23 17:02 k8s-triage-robot

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Feb 08 '23 17:02 k8s-triage-robot

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 08 '23 17:02 k8s-ci-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 10 '23 18:03 k8s-triage-robot