kube-state-metrics
kube-state-metrics copied to clipboard
Add node name to Pod Metrics
What would you like to be added: I would like for more of the metrics from kube-state-metrics to include the "node" label. Why is this needed: Knowing which node the pod is running on is valuable information for triaging problems, and linking them to root causes. If for example you know node x is being worked on, being able to see this pod on on that node would let you know that the issue is likily expected while the work is happening. Describe the solution you'd like pod metrics like: kube_pod_status_phase and others would return an additonal label for what node the pod is running on.
Additional context I have tried manaully adding this with: - source_labels: [__meta_kubernetes_pod_node_name]. action: replace. target_label: kubernetes_node.
but unfortunalty that doent work, it pulls in name of the node the kube-state-metrics pod is running on and not the name of the node the pod in the metric is alerting on.
This looks to be a similar closed issue: https://github.com/kubernetes/kube-state-metrics/issues/540
You can always join multiple metrics. In your example, you can use the following query to get what you need:
kube_pod_status_phase * on(pod) group_left(node) kube_pod_info
KSM tries to control cardinality as much as possible and offload most calculations to PromQL.
While that works on an individual metric, im trying to get all of our KSM metric based alerts to include the node label so i dont have to edit/rewrite all of our kuberentes alerts just to see what node the pod is running on.
I dont suppsoe you know a way to do that without editing all of our alerts via a source label remap or something simialar that would actually pull the pods node and not the kube-state-metric pods node
I see, in this case there's no simple way to add the label to all alerts. Unless @mrueg and @dgrisonnet have a reason why this can be problematic, I think it should be fine to add the node label to more pod metrics since there is a 1 to 1 mapping from pod to node.
Theoretically, it is fine to add the node name to the metrics as it doesn't increase the cardinality and it is useful information. However, since most of the pod metrics are marked as stable, I don't think we should add new dimensions to them.
@fpetkovski @dgrisonnet Just following up to see if there has been an agreed upon descission on this as its been about a month since the last reply.
I would personally be against this change since it would break stable metrics for non-essential purposes. As mentioned by Filip before, this information can already be aggregated from other metrics.
Since we don't have a consensus, I also wouldn't want to change the metric.
Currently, our only way to evolve metrics beyond stable is to change them in a new major release.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.