autoscaler
autoscaler copied to clipboard
capi: fix error when listing infra machines
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR affects the CAPI provider of cluster-autoscaler only.
Here we check that we have the proper permissions to list an infra object before we bootstrap the informer factory. Doing the latter has error handling side-effects that we don't want:
- https://github.com/kubernetes/client-go/blob/v0.29.0-alpha.3/tools/cache/reflector.go#L146-L148
Specfically it results in the error short-circuiting the process and resulting in a pod restart, which has the ultimate side-effect of deadlocking the cluster-autoscaler reconciler.
Which issue(s) this PR fixes:
Fixes #6490
Special notes for your reviewer:
Does this PR introduce a user-facing change?
capi cluster-autoscaler: fix error when listing infra machines
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
this makes good sense to me. as to your question on slack about caching, i don't think this would have an impact as if the conditions changed to allow the autoscaler to read the infrastructure object, then it would start caching.
i still like the change here, but something odd happening with the tests
@elmiko - looks like you're reviewing this one already, so assigning to you so it doesn't show up as unattended.
/assign @elmiko
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@jackfrancis should we keep this open?
/remove-lifecycle stale
just adding a comment, i'm trying to take a look into the unit test failures.
i'm baffled by the behavior here, going to need more time to understand how the clientset interaction is affecting this.
@elmiko thanks for taking a look at this while I've been away :)
lemme know if you discover anything, happy to hop onto a zoom and pair this out as well if that helps
i'll keep poking at it and ask some colleagues, but i will be off on vacation next week so i probably won't get back to this until the second week of july.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
apologies @jackfrancis , this fell off my radar
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@jackfrancis i have completely lost track of this issue. is this something we should continue to play kick-the-can with?
fwiw, i was never able to get answers to the questions i had internally.
@elmiko I rebased this as an excuse to put this back on my radar, lik you I haven't thought about this in many months :/
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@jackfrancis i think we still want this?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: jackfrancis
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [jackfrancis]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@elmiko thanks for the nudge, now let's see if we can figure out the UT situation before it goes stale again :)
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale