integrations-core kubernetes_state.node.count does not get the node labels from K8s

Describe the results you received: CleanShot 2022-07-22 at 12 17 29

Describe the results you expected: I would expect kubernetes_state.node.count to have the labels that are passed from the node, so that I can get the number of nodes that are within each node-group for monitoring.

Additional information you deem important (e.g. issue happens only occasionally): As you can see kubernetes_state.node.age (and others) have the node-group name and other information that I want to use. CleanShot 2022-07-22 at 12 17 41

Jul 22 '22 11:07 alexbowers

hi @alexbowers

the fact that kubernetes.node.count is not labeled with node label is the current expected behaviour because this metric is an aggregation (count) of nodes, so it can't have specific node's labels. We currently aggregate the nodes by: "kubelet_version", "container_runtime_version", "kernel_version", "os_image" You can see the current implementation here

kubernetes.node.age is not an aggregation because we provide the age for each node.

Please let us know if we can help in another way.

Thanks and regards Cedric

Aug 22 '22 13:08 clamoriniere

Can some way of defining specific labels from nodes to be put onto the aggregate metrics so that for example, you can aggregate by environment be considered?

As it stands, the aggregation isn't useful to us at all, because it combines our staging, QA, and production environments together and pollutes the actual data that we'd be looking for.

If there was a way for us to say "include env label in aggregation only" that would solve this problem for us.

Aug 22 '22 14:08 alexbowers

Hey, I had similar issue, solved it by using the following to get node count per nodegroup "sum:kubernetes_state.node.by_condition{kube_cluster_name:cluster-name,condition:ready,status:true} by {k8s-nodegroup}"

Nov 21 '22 18:11 13013SwagR

Great workaround, thanks @13013SwagR.

I just popped in to mention that this issue affected us as well - we have a number of monitors in which we aggregate kubernetes_state.node.count by aws_autoscaling_groupname, disappearance of this label was a fairly unwelcome surprise :(

Jan 16 '23 01:01 drmaciej

Hi @drmaciej

We now provide a set of "service checks" to represent the different "standard" Node conditions:

kubernetes_state.node.ready
kubernetes_state.node.out_of_disk
kubernetes_state.node.disk_pressure
kubernetes_state.node.network_unavailable
kubernetes_state.node.memory_pressure

See: https://docs.datadoghq.com/integrations/kubernetes_state_core/?tab=helm#service-checks

Because these service checks generate a status of each node and they are attached to the corresponding host, All the host tags can be use to "group by" in the monitor.

Jan 16 '23 18:01 clamoriniere

Thanks @clamoriniere, that makes sense.

I actually do not see kubernetes_state.node.out_of_disk or kubernetes_state.node.network_unavailable in my environments (I do see the other 3). Are those expected to show up only when there is no disk space or the network is not available?

Jan 17 '23 03:01 drmaciej

integrations-core integrations-core copied to clipboard

kubernetes_state.node.count does not get the node labels from K8s

integrations-core
integrations-core copied to clipboard