AKS
AKS copied to clipboard
[BUG] Random autoscale error
Describe the bug Autoscaler stops working at random times. It goes into initialization state. I fix it by setting scale method to manual and then back to autoscale.
To Reproduce The error is random, I can't simulate it.
Run command 'kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml'
apiVersion: v1
data:
status: |-
Cluster-autoscaler status at 2024-06-26 05:17:03.13712722 +0000 UTC:
Initializing
kind: ConfigMap
metadata:
annotations:
cluster-autoscaler.kubernetes.io/last-updated: 2024-06-26 05:17:03.13712722 +0000
UTC
creationTimestamp: "2024-06-26T05:17:03Z"
name: cluster-autoscaler-status
namespace: kube-system
resourceVersion: "****"
uid: ****
When I change to manual and back to autoscale it works fine.
apiVersion: v1
data:
status: |+
Cluster-autoscaler status at 2024-06-26 05:47:27.226243208 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleUp: NoActivity (ready=2 registered=2)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleDown: CandidatesPresent (candidates=1)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:42:35.843690766 +0000 UTC m=+312.090831134
NodeGroups:
Name: aks-poold4sv3-****-vmss
Health: Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=3))
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleUp: NoActivity (ready=1 cloudProviderTarget=1)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
Name: aks-poold8lsv5-****-vmss
Health: Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=0, maxSize=10))
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleUp: NoActivity (ready=1 cloudProviderTarget=1)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:37:34.758946819 +0000 UTC m=+11.006087187
ScaleDown: CandidatesPresent (candidates=1)
LastProbeTime: 2024-06-26 05:47:26.512785502 +0000 UTC m=+602.759926070
LastTransitionTime: 2024-06-26 05:42:35.843690766 +0000 UTC m=+312.090831134
kind: ConfigMap
metadata:
annotations:
cluster-autoscaler.kubernetes.io/last-updated: 2024-06-26 05:47:27.226243208 +0000
UTC
creationTimestamp: "2024-06-26T05:37:22Z"
name: cluster-autoscaler-status
namespace: kube-system
resourceVersion: "***"
uid: ****
Environment (please complete the following information):
- Kubernetes version 1.29.4
Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure
Can you please file a support ticket the next time this happens and update it here.
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. @kevinkrp93
This issue will now be closed because it hasn't had any activity for 7 days after stale. barkep feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.