autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

feat(metrics): Add node_group labels to cluster_autoscaler metrics to get more detailed information

Open dev-slatto opened this issue 1 year ago • 18 comments

Which component are you using?: Autoscaler and Metrics

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: As of today the metrics exposed in the Autoscaler allows one to see if the entire cluster is at max capacity compared to how many nodes that it have available in the node group / autoscaler.

Example: sum(cluster_autoscaler_nodes_count{}) by (app_kubernetes_io_name) / avg(cluster_autoscaler_max_nodes_count{}) by (app_kubernetes_io_name)

Say that I have a node group that can scale up to 8 nodes, with an min instance state of 2 nodes. I then want to get an alert with my metric system (eg. Prometheus) if this node group is at 8 groups meaning that I'm not able to scale this spesific group any further. This will allow me to take pro active actions on dedicated node groups.

Describe the solution you'd like.: It would be nice if there was some lables available with say the cluster_autoscaler_nodes_count metric that includes something that can be used to identify a node group, eg. the node_group label. Then you can filter and group by this lable, resolving the feature described above.

Describe any alternative solutions you've considered.: I can use some of the metrics that the cloud provider have, but this requires me to have all nodes regisered with the cloud provider and keep alerts for metrics in two different systems.

Additional context.: Slack thread: https://kubernetes.slack.com/archives/C09R1LV8S/p1686659016161629

dev-slatto avatar Jun 13 '23 19:06 dev-slatto