AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[BUG] Regression in kube-scheduler impacting Kubernetes versions v1.27.14, v1.28.10, v1.29.5

Open qpetraroia opened this issue 1 year ago • 1 comments

Describe the bug

The AKS team has found a bug that is causing regressions in kube-scheduler impacting Kubernetes versions v1.27.14, v1.28.10, v1.29.5. This regression causes the kube-scheduler to panic when a cluster has a pod with a bad node affinity. If this is the case, no pod will be scheduled.

Below is an example condition where the bug will be triggered

apiVersion: v1
kind: Pod
metadata:
  name: break-kube-scheduler
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - invalid-node # a node that doesn't exist
 

Interim fix Be aware that Azure Kubernetes Service engineers are actively fixing all impacted clusters as well as deploying a fix to kube-scheduler which will be rolled out to all regions. These fixes will be applied automatically to your cluster.

Thank you for understanding, The AKS team

qpetraroia avatar Jul 24 '24 21:07 qpetraroia

k8s issue: https://github.com/kubernetes/kubernetes/issues/124930

robbiezhang avatar Jul 24 '24 22:07 robbiezhang

Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure

This upstream issue has been fixed in AKS releases in July. closing this issue.

AllenWen-at-Azure avatar Aug 29 '24 05:08 AllenWen-at-Azure