AKS icon indicating copy to clipboard operation
AKS copied to clipboard

[BUG] New Nodepools don't report any node level metrics in Prometheus/Grafana

Open McGregsen opened this issue 3 months ago • 0 comments

Describe the bug When adding a new nodepool to an existing aks cluster the new nodes don't report any node level metrics to the connected prometheus/grafana instance (Grafana and Prometheus managed by azure).

Existing nodes in other nodepools as well as newly created nodes in existing nodepools (autoscaling) continue to report metrics.

When connecting directly to an affected node we see that the node-exporter binary is missing which seems to be connected to the AKSLinuxExtension (Compute.AKS.Linux.AKSNode) not being installed for the new vmss belonging to the new nodepool.

Quickly scanning the install log for the node does not show any obvious errors but also does not mention the AKSLinuxExtension extension. Other extensions (e.g. AKSLinuxBilling) are installed correctly.

To Reproduce Steps to reproduce the behavior:

  1. create a new nodepool in an existing cluster using az aks add nodepool ....
  2. check grafana dashboard to see if node metrics are present for nodes from the new nodepool

Environment (please complete the following information):

  • Kubernetes version 1.29.2
  • VM Image: AKSUbuntu-2204gen2containerd-202404.09.0
  • VM SKU: Standard_E8ads_v5

McGregsen avatar May 05 '24 19:05 McGregsen