AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[BUG] NAP not scaling while pods are pending in K8S 1.34.0

Open maxesse opened this issue 1 month ago • 3 comments

Describe the bug Hi there, last week I updated one of my clusters to 1.34.0 because I had a lot of sidecars coming for a workload and really wanted to take advantage of pod-level sizing. The upgrade went well but I soon realised that pods would become pending, and Karpenter would insist on allocating on a node without enough resources, with the scheduler refusing to place them. Deleting the pending pod wouldn't help, karpenter would insist on trying to place it on that same node it decided earlier. So, I had to create a temporary manually-scaled VMSS pool with enough headroom and move everything there because every time I'd look at the console there'd be some pod stuck pending, at least until I can get a higher k8s patch level.

To Reproduce Steps to reproduce the behavior:

  1. Upgrade AKS cluster to 1.34.0 with NAP enabled, with considerable load on the nodes
  2. Perform rolling deployments/restarts for existing workloads, and soon enough one or more will get stuck pending if there are no other static pools to fall back to

Expected behavior NAP should provision more nodes as it used to up to 1.33.5

Environment (please complete the following information):

  • AZ CLI Version 2.81.0
  • Kubernetes version 1.34.0
  • Kubectl version 1.34.2

Are there any known issues around this, or workarounds? I don't really like having to manage a static vmss pool.

maxesse avatar Dec 09 '25 17:12 maxesse