autoscaler
autoscaler copied to clipboard
Cluster Autoscaler pod fails with error "MissingRegion"
cluster-autoscaler: 1.30
Component version:
What k8s version are you using (kubectl version)?:
Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.4-eks-a737599
kubectl version Output
$ kubectl version
What environment is this in?: Test
What did you expect to happen?:
I1014 17:54:22.064047 1 main.go:644] Cluster Autoscaler 1.30.0 I1014 17:54:22.155804 1 leaderelection.go:250] attempting to acquire leader lease kube-system/cluster-autoscaler... I1014 17:54:22.168685 1 leaderelection.go:260] successfully acquired lease kube-system/cluster-autoscaler I1014 17:54:22.169026 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"b82b9120-4fc3-4bc2-8b92-21daf9dd151f", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"19453", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-7c5484cd44-59xj8 became leader I1014 17:54:22.251583 1 framework.go:373] "the scheduler starts to work with those plugins" Plugins={"PreEnqueue":{"Enabled":[{"Name":"SchedulingGates","Weight":0}],"Disabled":null},"QueueSort":{"Enabled":[{"Name":"PrioritySort","Weight":0}],"Disabled":null},"PreFilter":{"Enabled":[{"Name":"NodeAffinity","Weight":0},{"Name":"NodePorts","Weight":0},{"Name":"NodeResourcesFit","Weight":0},{"Name":"VolumeRestrictions","Weight":0},{"Name":"EBSLimits","Weight":0},{"Name":"GCEPDLimits","Weight":0},{"Name":"NodeVolumeLimits","Weight":0},{"Name":"AzureDiskLimits","Weight":0},{"Name":"VolumeBinding","Weight":0},{"Name":"VolumeZone","Weight":0},{"Name":"PodTopologySpread","Weight":0},{"Name":"InterPodAffinity","Weight":0}],"Disabled":null},"Filter":{"Enabled":[{"Name":"NodeUnschedulable","Weight":0},{"Name":"NodeName","Weight":0},{"Name":"TaintToleration","Weight":0},{"Name":"NodeAffinity","Weight":0},{"Name":"NodePorts","Weight":0},{"Name":"NodeResourcesFit","Weight":0},{"Name":"VolumeRestrictions","Weight":0},{"Name":"EBSLimits","Weight":0},{"Name":"GCEPDLimits","Weight":0},{"Name":"NodeVolumeLimits","Weight":0},{"Name":"AzureDiskLimits","Weight":0},{"Name":"VolumeBinding","Weight":0},{"Name":"VolumeZone","Weight":0},{"Name":"PodTopologySpread","Weight":0},{"Name":"InterPodAffinity","Weight":0}],"Disabled":null},"PostFilter":{"Enabled":[{"Name":"DefaultPreemption","Weight":0}],"Disabled":null},"PreScore":{"Enabled":[{"Name":"TaintToleration","Weight":0},{"Name":"NodeAffinity","Weight":0},{"Name":"NodeResourcesFit","Weight":0},{"Name":"VolumeBinding","Weight":0},{"Name":"PodTopologySpread","Weight":0},{"Name":"InterPodAffinity","Weight":0},{"Name":"NodeResourcesBalancedAllocation","Weight":0}],"Disabled":null},"Score":{"Enabled":[{"Name":"TaintToleration","Weight":3},{"Name":"NodeAffinity","Weight":2},{"Name":"NodeResourcesFit","Weight":1},{"Name":"VolumeBinding","Weight":1},{"Name":"PodTopologySpread","Weight":2},{"Name":"InterPodAffinity","Weight":2},{"Name":"NodeResourcesBalancedAllocation","Weight":1},{"Name":"ImageLocality","Weight":1}],"Disabled":null},"Reserve":{"Enabled":[{"Name":"VolumeBinding","Weight":0}],"Disabled":null},"Permit":{"Enabled":null,"Disabled":null},"PreBind":{"Enabled":[{"Name":"VolumeBinding","Weight":0}],"Disabled":null},"Bind":{"Enabled":[{"Name":"DefaultBinder","Weight":0}],"Disabled":null},"PostBind":{"Enabled":null,"Disabled":null},"MultiPoint":{"Enabled":null,"Disabled":null}} I1014 17:54:22.265983 1 cloud_provider_builder.go:30] Building aws cloud provider. E1014 17:54:25.405156 1 aws_cloud_provider.go:433] Failed to generate AWS EC2 Instance Types: MissingRegion: could not find region configuration, falling back to static list with last update time: 2024-04-08
What happened instead?:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: cluster-autoscaler pod is crashing in our setup with the missing Region error , how to solve it
Also when i try to deploy 1.29 EKS version and 1.29 cluster-autoscaler version i am not seeing any issue even when i tried to perform the EKS upgrade from 1.29 to 1.30 i am not seeing this issue either . Only in the fresh install of 1.30 version of EKS and cluster-autoscaler i am getting the reported issue.
### Tasks
### Tasks
/area cluster-autoscaler
i started experiecing this issues also, i added this environment variable block:
env {
name = "AWS_REGION"
value = "eu-west-1"
}
but now I am getting this error:
I1016 10:13:31.629902 1 auto_scaling_groups.go:360] Regenerating instance to ASG map for ASG names: []
I1016 10:13:31.629918 1 auto_scaling_groups.go:367] Regenerating instance to ASG map for ASG tags: map[k8s.io/cluster-autoscaler/enabled: k8s.io/cluster-autoscaler/fcmb-stg-tco0001-cluster:]
I1016 10:13:34.932762 1 trace.go:219] Trace[774965466]: "Reflector ListAndWatch" name:k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212 (16-Oct-2024 10:13:24.533) (total time: 10398ms):
Trace[774965466]: ---"Objects listed" error:
One workaround which i followed is updating the EKS AMI type to "AL2_x86_64" instead of using the default type:
.
I also got this error (the same logs also with AWS_REGION env). The error does not appears with clusters upgraded (I tried to upgrade from 1.29 in series), only on new clusters.
@sivachandran-s the workaround which you followed is updating the EKS AMI type to "AL2_x86_64" instead of using the default type is working fine, it actually fixed that issue. But what is the permanent fix of this? Are you aware of it or anybody know, please let us know here
Unfortunately in the near future kubernetes v1.33 will be required in AWS, which will force us to use the AL2023-x86_64 images, which are already being applied by default to new clusters. When this happens, autoscaler will be broken without an available workaround.
I just ran into this myself, has a more permanent fix been found?
any update on this? as 4 months left people should be already started migrating to AL2023 due to the EKS AMI end-of-support.
I've found a workaround.
- Install the
Amazon EKS Pod Identity Agentto the cluster - Assign the required IAM policy to a new IAM role, where you specify the
cluster-autoscalerService Account in the trust-relationship. - Annotate the
cluster-autoscalerService Account with the new role ARN.
It's some extra step, but at least it's working with AL2023 images. I followed these docs: CA_with_AWS_IAM_OIDC IAM roles for service accounts
Extra info about the issue:
https://github.com/awslabs/amazon-eks-ami/issues/1696
For IMDSv2, the default hop count for managed node groups is set to 1. This means that containers won't have access to the node's credentials using IMDS. If you require container access to the node's credentials, you can still do so by manually overriding the HttpPutResponseHopLimit in a custom EC2 launch template, increasing it to 2, and by using EKS Pod Identity.
@adrianmoisey (since you tagged this originally) - Are there any updates here that do not require creating new IAM policies or roles? Time's quickly running out, so having an official response would be greatly appreciated
@adrianmoisey (since you tagged this originally) - Are there any updates here that do not require creating new IAM policies or roles? Time's quickly running out, so having an official response would be greatly appreciated
I don't work on the cluster-autoscaler, so I can't help.
Another workaround is to enable IMDSv1 via the launch template of the node group. I also increased the hop limit to 3 but not sure if that's needed.
This is an AWS provider issue, so the right people to tag are the @gjtempleton and @drmorr0 who are currently in OWNERS file.
Ref: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/OWNERS
any updates on this one ? EKS 1.33 doesn't work, even when updating to registry.k8s.io/autoscaling/cluster-autoscaler:v1.33.0
Failed to regenerate ASG cache: MissingRegion: could not find region configuration Failed to create AWS Manager: MissingRegion: could not find region configuration
I am also getting this issue using cluster-autoscaler:v1.33.0 with node pool using Amazon Linux 2023.7.20250609 image. Any updates?
We are forced to upgrade EKS cluster to newer version to avoid the extended support pricing and we are now not having the option to use older version of AMI AL2_x86_64. We only have option with AL2023-x86_64 but this breaks our cluster-autoscaler. It has been a year now since the ticket was opened and no fix version for this.
I'm experienced the same issue with cluster-autoscaler-1.34.1. Anyone has work around to get the cluster-autoscaler running?
Thanks
We are forced to upgrade EKS cluster to newer version to avoid the extended support pricing and we are now not having the option to use older version of AMI AL2_x86_64. We only have option with AL2023-x86_64 but this breaks our cluster-autoscaler. It has been a year now since the ticket was opened and no fix version for this.
I'm experienced the same issue with cluster-autoscaler-1.34.1. Anyone has work around to get the cluster-autoscaler running?
Thanks
I was able to get cluster autoscaling working again on the newer AL2023 nodes by enabling IRSA on our EKS cluster by creating an IAM OIDC Identity Provider for the cluster, and then creating an IAM policy for the cluster scaler, an IAM role using this new policy with correct trust policy. Once the role is created I annotated the cluster service account "cluster-autoscaler" to use the new role, restart the k8s deployment and viola!
Hopefully this helps.
Helpful documentation of this process:
- https://docs.aws.amazon.com/eks/latest/best-practices/cas.html
- https://builder.aws.com/content/2a9qUKMTGUM6DkFdi0dNwtQnAke/cluster-autoscaler-configure-on-aws-eks-124
Thank you @cwh-hcl, feel free to re-open (anyone on this thread) if there are more open issues here.
Thanks for the tips and links @cwh-hcl
We got the cluster up and running with the Amazon Linux 2023 (x86_64) Standard AMI now.
I just followed the link https://builder.aws.com/content/2a9qUKMTGUM6DkFdi0dNwtQnAke/cluster-autoscaler-configure-on-aws-eks-124 and created the new Role Name : EKS_Autoscaler and used this on the cluster-autoscaler-autodiscover deployment.
tritu$ diff cluster-autoscaler-autodiscover-CALICO-PRD-EKS.yaml cluster-autoscaler-autodiscover-CALICO-PRD-EKS-new.yaml 7a8,9
annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<my_id>:role/EKS_Autoscaler tritu$
Something must had changes on the Amazon Linux 2023 (x86_64) Standard AMI that now it needs ServiceAccount role.
The same deployment works fine for cluster-autoscaler when deploying on Bottlerocket (BOTTLEROCKET_x86_64) AMI but we can't use this BOTTLEROCKET_x86_64 because of this bug https://github.com/bottlerocket-os/bottlerocket/issues/4022.
Happy that we got it's working back on the Amazon Linux 2023 (x86_64) AMI now. Thanks