autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Couldn't find template for node group

Open duviful opened this issue 1 year ago • 4 comments

Which component are you using?: cluster-autoscaler

What version of the component are you using?:

Component version: 1.30.0

What k8s version are you using (kubectl version)?:

kubectl version Output
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.3

What environment is this in?: cluster-api-vsphere

What did you expect to happen?:

Autoscaler pods is able to trigger scaling

What happened instead?:

This error is repeated multiple time in the cluster-autoscaler pods logs:

[static_autoscaler.go:1036] Couldn't find template for node group MachineDeployment/default/workload-cluster-1-md-0

How to reproduce it (as minimally and precisely as possible):

Define workload cluster using cluster-api in an existing management cluster, all hosted on vSphere. The workload cluster is deployed correctly, it scales up and down using 'kubectl scale machinedeployment workload-cluster-1-md-0 --replicas x' The autoscaler is then deployed using the helm chart as a base, with a kustomization to address cloud-specific resources permissions, as mentioned in #5509

Anything else we need to know?:

duviful avatar May 28 '24 13:05 duviful

You must configure node group auto discovery to inform cluster autoscaler which cluster in which to find for scalable node groups. Users of single-arch non-amd64 clusters who are using scale from zero support should also set the CAPI_SCALE_ZERO_DEFAULT_ARCH environment variable to set the architecture of the nodes they want to default the node group templates to. The autoscaler will default to amd64 if it is not set, and the node group templates may not match the nodes’ architecture,

kundan2707 avatar Jun 19 '24 18:06 kundan2707

/remove-kind bug

kundan2707 avatar Jun 19 '24 18:06 kundan2707

/kind support

kundan2707 avatar Jun 19 '24 18:06 kundan2707

Thank you for your response.

Node group auto-discovery was already defined in the pod's command. I'll post you an extract.

│    clusterapi-cluster-autoscaler:                                                                                                                                                            │
│     Image:      registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0                                                                                                                       │
│     Port:       8085/TCP                                                                                                                                                                     │
│     Host Port:  0/TCP                                                                                                                                                                        │
│     Command:                                                                                                                                                                                 │
│       ./cluster-autoscaler                                                                                                                                                                   │
│       --cloud-provider=clusterapi                                                                                                                                                            │
│       --namespace=cluster-autoscaler-system                                                                                                                                                  │
│       --node-group-auto-discovery=clusterapi:clusterName=workload-cluster-1                                                                                                                  │
│       --logtostderr=false                                                                                                                                                                    │
│       --stderrthreshold=info                                                                                                                                                                 │
│       --v=1  

I won't use CAPI_SCALE_ZERO_DEFAULT_ARCH because the CPU architecture is a standard amd64.

duviful avatar Jun 21 '24 07:06 duviful

/area cluster-autoscaler

adrianmoisey avatar Jul 08 '24 18:07 adrianmoisey

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 06 '24 20:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 05 '24 21:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 05 '24 22:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Dec 05 '24 22:12 k8s-ci-robot