autoscaler capi: fix error when listing infra machines

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR affects the CAPI provider of cluster-autoscaler only.

Here we check that we have the proper permissions to list an infra object before we bootstrap the informer factory. Doing the latter has error handling side-effects that we don't want:

https://github.com/kubernetes/client-go/blob/v0.29.0-alpha.3/tools/cache/reflector.go#L146-L148

Specfically it results in the error short-circuiting the process and resulting in a pod restart, which has the ultimate side-effect of deadlocking the cluster-autoscaler reconciler.

Which issue(s) this PR fixes:

Fixes #6490

Special notes for your reviewer:

Does this PR introduce a user-facing change?

capi cluster-autoscaler: fix error when listing infra machines

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Feb 02 '24 00:02 jackfrancis

this makes good sense to me. as to your question on slack about caching, i don't think this would have an impact as if the conditions changed to allow the autoscaler to read the infrastructure object, then it would start caching.

Feb 02 '24 01:02 elmiko

i still like the change here, but something odd happening with the tests

Feb 02 '24 17:02 elmiko

@elmiko - looks like you're reviewing this one already, so assigning to you so it doesn't show up as unattended.

/assign @elmiko

Feb 06 '24 10:02 x13n

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 06 '24 22:05 k8s-triage-robot

@jackfrancis should we keep this open?

/remove-lifecycle stale

May 07 '24 14:05 elmiko

just adding a comment, i'm trying to take a look into the unit test failures.

Jun 27 '24 12:06 elmiko

i'm baffled by the behavior here, going to need more time to understand how the clientset interaction is affecting this.

Jun 27 '24 12:06 elmiko

@elmiko thanks for taking a look at this while I've been away :)

lemme know if you discover anything, happy to hop onto a zoom and pair this out as well if that helps

Jun 27 '24 23:06 jackfrancis

i'll keep poking at it and ask some colleagues, but i will be off on vacation next week so i probably won't get back to this until the second week of july.

Jun 28 '24 08:06 elmiko

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 26 '24 08:09 k8s-triage-robot

apologies @jackfrancis , this fell off my radar

/remove-lifecycle stale

Sep 26 '24 13:09 elmiko

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 30 '24 20:12 k8s-triage-robot

@jackfrancis i have completely lost track of this issue. is this something we should continue to play kick-the-can with?

fwiw, i was never able to get answers to the questions i had internally.

Jan 09 '25 14:01 elmiko

@elmiko I rebased this as an excuse to put this back on my radar, lik you I haven't thought about this in many months :/

Jan 14 '25 00:01 jackfrancis

/remove-lifecycle stale

Jan 23 '25 10:01 Shubham82

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 02 '25 18:07 k8s-triage-robot

/remove-lifecycle stale

@jackfrancis i think we still want this?

Jul 02 '25 19:07 elmiko

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [jackfrancis]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 03 '25 20:07 k8s-ci-robot

@elmiko thanks for the nudge, now let's see if we can figure out the UT situation before it goes stale again :)

Jul 03 '25 22:07 jackfrancis

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Aug 12 '25 23:08 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 11 '25 00:11 k8s-triage-robot

autoscaler autoscaler copied to clipboard

capi: fix error when listing infra machines

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

autoscaler
autoscaler copied to clipboard