karpenter-provider-aws icon indicating copy to clipboard operation
karpenter-provider-aws copied to clipboard

fix: Fix EC2NodeClass readiness to not stuck in False

Open mumoshu opened this issue 7 months ago • 3 comments

Fixes #7875

Description

We observed some EC2NodeClasses stuck in Ready=False with message of Failed to detect the cluster CIDR. We believe this is caused by an issue in Karpenter when EKS API failed temporarily, and this PR fixes that.

I noticed this is just another way to fix the same issue the PR #7965 by @AlexeyPetroff trying to solve!

How was this change tested?

The first commit adds a failing test case, and the second commit fixes the implementation to let the test pass.

Does this change impact docs?

  • [ ] Yes, PR includes docs updates
  • [ ] Yes, issue opened: #
  • [x] No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mumoshu avatar Apr 04 '25 00:04 mumoshu

Deploy Preview for karpenter-docs-prod canceled.

Name Link
Latest commit f2242bc9f21accb0f68001cc772b7b5d1be8bdd2
Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/67ef26f9cce1bd000883c5bf

netlify[bot] avatar Apr 04 '25 00:04 netlify[bot]

Deploy Preview for karpenter-docs-prod canceled.

Name Link
Latest commit c6c42a77bb953726acb8a93e5d9b328cc6a70a1c
Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/67ef27035b917d0008c85b82

netlify[bot] avatar Apr 04 '25 00:04 netlify[bot]

Thanks for opening this @mumoshu! Take a look at the feedback that I gave here: https://github.com/aws/karpenter-provider-aws/pull/7965/files#r2024015633. I think we should move this to the validation controller.

jonathan-innis avatar Apr 04 '25 16:04 jonathan-innis

Closing this out since this was addressed in the validation controller in https://github.com/aws/karpenter-provider-aws/pull/8408

jmdeal avatar Oct 21 '25 17:10 jmdeal