kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Add node name to Pod Metrics

Open bsamsom opened this issue 2 years ago • 9 comments

What would you like to be added: I would like for more of the metrics from kube-state-metrics to include the "node" label. Why is this needed: Knowing which node the pod is running on is valuable information for triaging problems, and linking them to root causes. If for example you know node x is being worked on, being able to see this pod on on that node would let you know that the issue is likily expected while the work is happening. Describe the solution you'd like pod metrics like: kube_pod_status_phase and others would return an additonal label for what node the pod is running on.

Additional context I have tried manaully adding this with: - source_labels: [__meta_kubernetes_pod_node_name]. action: replace. target_label: kubernetes_node.

but unfortunalty that doent work, it pulls in name of the node the kube-state-metrics pod is running on and not the name of the node the pod in the metric is alerting on.

This looks to be a similar closed issue: https://github.com/kubernetes/kube-state-metrics/issues/540

bsamsom avatar Mar 16 '22 15:03 bsamsom

You can always join multiple metrics. In your example, you can use the following query to get what you need:

kube_pod_status_phase * on(pod) group_left(node) kube_pod_info

KSM tries to control cardinality as much as possible and offload most calculations to PromQL.

fpetkovski avatar Mar 17 '22 12:03 fpetkovski

While that works on an individual metric, im trying to get all of our KSM metric based alerts to include the node label so i dont have to edit/rewrite all of our kuberentes alerts just to see what node the pod is running on.

I dont suppsoe you know a way to do that without editing all of our alerts via a source label remap or something simialar that would actually pull the pods node and not the kube-state-metric pods node

bsamsom avatar Mar 22 '22 16:03 bsamsom

I see, in this case there's no simple way to add the label to all alerts. Unless @mrueg and @dgrisonnet have a reason why this can be problematic, I think it should be fine to add the node label to more pod metrics since there is a 1 to 1 mapping from pod to node.

fpetkovski avatar Mar 29 '22 13:03 fpetkovski

Theoretically, it is fine to add the node name to the metrics as it doesn't increase the cardinality and it is useful information. However, since most of the pod metrics are marked as stable, I don't think we should add new dimensions to them.

dgrisonnet avatar Mar 29 '22 14:03 dgrisonnet

@fpetkovski @dgrisonnet Just following up to see if there has been an agreed upon descission on this as its been about a month since the last reply.

bsamsom avatar Apr 25 '22 13:04 bsamsom

I would personally be against this change since it would break stable metrics for non-essential purposes. As mentioned by Filip before, this information can already be aggregated from other metrics.

dgrisonnet avatar Apr 27 '22 12:04 dgrisonnet

Since we don't have a consensus, I also wouldn't want to change the metric.

Currently, our only way to evolve metrics beyond stable is to change them in a new major release.

fpetkovski avatar Apr 27 '22 12:04 fpetkovski

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 26 '22 12:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 25 '22 13:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Sep 24 '22 13:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 24 '22 13:09 k8s-ci-robot