karpenter-provider-aws
karpenter-provider-aws copied to clipboard
fix: Fix EC2NodeClass readiness to not stuck in False
Fixes #7875
Description
We observed some EC2NodeClasses stuck in Ready=False with message of Failed to detect the cluster CIDR. We believe this is caused by an issue in Karpenter when EKS API failed temporarily, and this PR fixes that.
I noticed this is just another way to fix the same issue the PR #7965 by @AlexeyPetroff trying to solve!
How was this change tested?
The first commit adds a failing test case, and the second commit fixes the implementation to let the test pass.
Does this change impact docs?
- [ ] Yes, PR includes docs updates
- [ ] Yes, issue opened: #
- [x] No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Deploy Preview for karpenter-docs-prod canceled.
| Name | Link |
|---|---|
| Latest commit | f2242bc9f21accb0f68001cc772b7b5d1be8bdd2 |
| Latest deploy log | https://app.netlify.com/sites/karpenter-docs-prod/deploys/67ef26f9cce1bd000883c5bf |
Deploy Preview for karpenter-docs-prod canceled.
| Name | Link |
|---|---|
| Latest commit | c6c42a77bb953726acb8a93e5d9b328cc6a70a1c |
| Latest deploy log | https://app.netlify.com/sites/karpenter-docs-prod/deploys/67ef27035b917d0008c85b82 |
Thanks for opening this @mumoshu! Take a look at the feedback that I gave here: https://github.com/aws/karpenter-provider-aws/pull/7965/files#r2024015633. I think we should move this to the validation controller.
Closing this out since this was addressed in the validation controller in https://github.com/aws/karpenter-provider-aws/pull/8408