integrations-core
integrations-core copied to clipboard
kubernetes_state.node.count does not get the node labels from K8s
Describe the results you received:
Describe the results you expected:
I would expect kubernetes_state.node.count
to have the labels that are passed from the node, so that I can get the number of nodes that are within each node-group for monitoring.
Additional information you deem important (e.g. issue happens only occasionally):
As you can see kubernetes_state.node.age
(and others) have the node-group name and other information that I want to use.
hi @alexbowers
the fact that kubernetes.node.count
is not labeled with node label is the current expected behaviour because this metric is an aggregation (count) of nodes, so it can't have specific node's labels.
We currently aggregate the nodes by: "kubelet_version", "container_runtime_version", "kernel_version", "os_image"
You can see the current implementation here
kubernetes.node.age
is not an aggregation because we provide the age for each node.
Please let us know if we can help in another way.
Thanks and regards Cedric
Can some way of defining specific labels from nodes to be put onto the aggregate metrics so that for example, you can aggregate by environment be considered?
As it stands, the aggregation isn't useful to us at all, because it combines our staging, QA, and production environments together and pollutes the actual data that we'd be looking for.
If there was a way for us to say "include env
label in aggregation only" that would solve this problem for us.
Hey,
I had similar issue, solved it by using the following to get node count per nodegroup
"sum:kubernetes_state.node.by_condition{kube_cluster_name:cluster-name,condition:ready,status:true} by {k8s-nodegroup}"
Great workaround, thanks @13013SwagR.
I just popped in to mention that this issue affected us as well - we have a number of monitors in which we aggregate kubernetes_state.node.count
by aws_autoscaling_groupname
, disappearance of this label was a fairly unwelcome surprise :(
Hi @drmaciej
We now provide a set of "service checks" to represent the different "standard" Node conditions:
- kubernetes_state.node.ready
- kubernetes_state.node.out_of_disk
- kubernetes_state.node.disk_pressure
- kubernetes_state.node.network_unavailable
- kubernetes_state.node.memory_pressure
See: https://docs.datadoghq.com/integrations/kubernetes_state_core/?tab=helm#service-checks
Because these service checks generate a status of each node and they are attached to the corresponding host, All the host tags can be use to "group by" in the monitor.

Thanks @clamoriniere, that makes sense.
I actually do not see kubernetes_state.node.out_of_disk
or kubernetes_state.node.network_unavailable
in my environments (I do see the other 3). Are those expected to show up only when there is no disk space or the network is not available?