kube-state-metrics kube_pod_container_resource.* showing duplicate metrics with and without node label

What happened: for these resource metrics, it is expected that just one metric appears per container, aka one limit and one request metric.

however, we're seeing two of each - one without a node label that appears for a brief moment, and then one with a node label, which replaces it (see image below)

this is problematic because there is a time in point where both metrics co-exist, and that breaks some of our recording rules that expect both of these metrics to be unique. you can see that the without-node metric exists for a brief moment before the with-node metric appears, and subsequently the without-node metric disappears (though you can't really see it dropping off in the screenshot since the with-node one is on top).

i'm curious as to why this happens and why the node label is special here.

What you expected to happen:

that each of these metrics (kube_pod_container_resource_requests and kube_pod_container_resource_limits) are unique per container.

How to reproduce it (as minimally and precisely as possible): This happens when any pod gets newly created.

Anything else we need to know?:

Environment:

kube-state-metrics version: 2.2.0
Kubernetes version (use kubectl version): 1.18.20
Cloud provider or hardware configuration: self-hosted
Other info:

        - /kube-state-metrics
        - --port=9102
        - --telemetry-port=8081
        - --resources=configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets
        - --use-apiserver-cache
        - --metric-labels-allowlist=daemonsets=[*],deployments=[*],jobs=[*],nodes=[*],pods=[*],secrets=[*]
        - -v4
        - --pod=$(POD_NAME)
        - --pod-namespace=$(POD_NAMESPACE)

Sep 06 '22 23:09 jpdstan

/assign

Sep 11 '22 17:09 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 10 '22 18:12 k8s-triage-robot

Technically, there could be three metrics with the same kube_pod_container_resource_.+, and it seems like this behavior was intentional, due to the varying types of resource names.

AFA the transient state of the metric(s) is concerned, I believe we can add a warning: https://github.com/kubernetes/kube-state-metrics/pull/1929.

cc @dgrisonnet

Dec 11 '22 04:12 rexagod

I am not sure if we can fix this in KSM. The node name is part of the metric, and if we defer exposition until the node name is available, resources for pods in pending state will not be exposed. Can you update your recording rules to filter out pods with empty label names?

Dec 12 '22 19:12 fpetkovski

IMO we shouldn't do anything about this. The goal of ksm is to reflect the state of the kube-apiserver. In this particular case, the pod object was created but wasn't yet assigned to a node and this should be reflected since it might be meaningful for debugging.

this is problematic because there is a time in point where both metrics co-exist, and that breaks some of our recording rules that expect both of these metrics to be unique.

Both metrics never coexist, the new one with the node labels replaces the previous one that didn't have a node value. Both are showing in your graph because you are executing a query over a range of time (the type in Grafana is set to both), where it should be instant in a recording/alerting rule.

Dec 13 '22 19:12 dgrisonnet

+1, this should be the expected behavior.

Dec 13 '22 19:12 rexagod

the pod object was created but wasn't yet assigned to a node and this should be reflected since it might be meaningful for debugging.

That makes sense to me. In this case we can reflect our recording rules to just take the max, thanks for the context

Both metrics never coexist, the new one with the node labels replaces the previous one that didn't have a node value

i would like to point out though that this is not what we're experiencing. in other words, we do see an overlap in time when both metrics exist. maybe this has to do with the fact that we're in sharded mode?

Jan 03 '23 22:01 jpdstan

i would like to point out though that this is not what we're experiencing. in other words, we do see an overlap in time when both metrics exist. maybe this has to do with the fact that we're in sharded mode?

That sounds like a bug then. In a non-sharded environment there are no way for this scenario to happen since the data is always overridden and there is only one source of truth so the data can't be deduplicated. In the case of sharding, it should be the same since a Kubernetes object should only be taken care of by one shard, but there might be a bug that results in having two shards exposing data about one object.

I will look into the code if I can find anything weird, but we might need a reproducer in order to investigate that.

Jan 04 '23 15:01 dgrisonnet

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 04 '23 16:04 k8s-triage-robot

/unassign /remove-lifecycle stale

Apr 25 '23 03:04 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 19 '24 02:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 18 '24 02:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 19 '24 03:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 19 '24 03:03 k8s-ci-robot

kube-state-metrics kube-state-metrics copied to clipboard

kube_pod_container_resource.* showing duplicate metrics with and without node label

kube-state-metrics
kube-state-metrics copied to clipboard