kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate

Open xiaozhangzhang1 opened this issue 4 years ago • 25 comments

record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
expr: sum by(cluster, namespace, pod, container) (rate(container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor"}[5m])) * on(cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""}))

this record rule is not work in promethues ,if i change on(cluster, namespace, pod) is on( namespace, pod),it works

  • Prometheus Operator version: release-0.6

  • Kubernetes version information:

    kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.14", GitCommit:"89182bdd065fbcaffefec691908a739d161efc03", GitTreeState:"clean", BuildDate:"2020-12-18T12:11:25Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.14", GitCommit:"89182bdd065fbcaffefec691908a739d161efc03", GitTreeState:"clean", BuildDate:"2020-12-18T12:02:35Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

xiaozhangzhang1 avatar Mar 24 '21 09:03 xiaozhangzhang1

It looks like you have a cluster label in one metric but not in the other. Does container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor", cluster!=""} and kube_pod_info{node!="",cluster!=""} return any output?

paulfantom avatar Mar 24 '21 09:03 paulfantom

It looks like you have a cluster label in one metric but not in the other. Does container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor", cluster!=""} and kube_pod_info{node!="",cluster!=""} return any output?

i did, return no data

xiaozhangzhang1 avatar Mar 24 '21 09:03 xiaozhangzhang1

It looks like you have a cluster label in one metric but not in the other. Does container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor", cluster!=""} and kube_pod_info{node!="",cluster!=""} return any output?

kube_pod_info{node!="",cluster!=""} return many kube_pod_info{cluster="",container="kube-rbac-proxy-main",created_by_kind="",created_by_name="",host_ip="",instance="",job="kube-state-metrics",namespace="default",node="master01",pod="netshoot",pod_ip="",uid="ef6d61ac-fed4-4ee3-b757-de912a6863fb"}

xiaozhangzhang1 avatar Mar 24 '21 09:03 xiaozhangzhang1

container_cpu_usage_seconds_total{container!="POD",image!="",job="kubelet",metrics_path="/metrics/cadvisor", cluster!=""} return no data

xiaozhangzhang1 avatar Mar 24 '21 09:03 xiaozhangzhang1

It seems that your cluster is not configured correctly and you have cluster label attached to metrics from kube-state-metrics, but not to metrics from kubelet. You need to have it in both places.

paulfantom avatar Mar 24 '21 09:03 paulfantom

It seems that your cluster is not configured correctly and you have cluster label attached to metrics from kube-state-metrics, but not to metrics from kubelet. You need to have it in both places.

thanks ,i got it ,yes ,i did cluster label to kube-state-metrics, i did the same to kubelet, but it not works ,

xiaozhangzhang1 avatar Mar 24 '21 09:03 xiaozhangzhang1

I have the same issue where CPU usage is not working anymore on grafana

ArchiFleKs avatar Jul 20 '21 17:07 ArchiFleKs

Hi,

I'm not sure but I think this issue was fixed in :

https://github.com/prometheus-operator/kube-prometheus/commit/78a467737064513e64eeb2b1df60a2478cfeb23c

It seems that the record node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate was replaced by node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate

rouja avatar Oct 05 '21 17:10 rouja

I am also facing same issue

sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="$cluster", namespace="$namespace"}) / sum(kube_pod_container_resource_requests{job="kube-state-metrics", cluster="$cluster", namespace="$namespace", resource="cpu"})

Does not return data

SonalJain1707 avatar Nov 16 '22 09:11 SonalJain1707

+1 the same thing

I am also facing same issue

sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{cluster="$cluster", namespace="$namespace"}) / sum(kube_pod_container_resource_requests{job="kube-state-metrics", cluster="$cluster", namespace="$namespace", resource="cpu"})

Does not return data

rmn-lux avatar Dec 12 '22 13:12 rmn-lux

When querying node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate on my prometheus instance, that does not return anything. From what exporter this metric come from ?

jeremydescamps avatar Dec 22 '22 09:12 jeremydescamps

Hi there,

Prometheus stack chart : kube-prometheus-stack-43.1.1, App Version: 0.61.1 K8s deployed with Rancher Docker version 2.7 : 1.24.4

To me, it was related to this issue : https://github.com/k3s-io/k3s/issues/5782 As mentioned in the issue, image label is now missing.

Workaround : I removed the image!="" label in all the rules from prometheus-stack-kube-prom-k8s.rules.yaml file and now my grafana dashboard work like a charm

# Extract from file : prometheus-stack-kube-prom-k8s.rules.yaml
- name: k8s.rules
      rules:
        - expr: >-
            sum by (cluster, namespace, pod, container) (
              irate(container_cpu_usage_seconds_total{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])
            ) * on (cluster, namespace, pod) group_left(node) topk by (cluster,
            namespace, pod) (
              1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
            )
          record: >-
            node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate

image

This file has to be modified as well : prometheus-stack-kube-prom-k8s-resources-workload.yaml Remove : container!="" and image!="" Don't forget to kill pod : prometheus-stack-grafana so dashboards get updated !

fguiet avatar Dec 22 '22 09:12 fguiet

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

github-actions[bot] avatar Feb 21 '23 03:02 github-actions[bot]

Hi @fguiet Got stuck in this problem as well, i am using latest helm chart version. I am using minikube v1.28.0.

I've already removed the label image!="" from k8s.rules and cpu dashboards started working.

However, i still have issues for memory dashboard which basically use metric container_memory_working_set_bytes.

Example: sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", cluster="$cluster", namespace="$namespace", pod="$pod", container!="", image!=""}) by (container)

From prometheus, there is no cluster, container or image labels for this metric. Did you face this issue as well and if yes, how did you fix it?

image

Similar issue with dashboards using fs metrics (and probably a lot of other metrics from cadvisor): sum by(container) (rate(container_fs_reads_total{job="kubelet", metrics_path="/metrics/cadvisor", device=~"(/dev/)?(mmcblk.p.+|nvme.+|rbd.+|sd.+|vd.+|xvd.+|dm-.+|md.+|dasd.+)", container!="", cluster="$cluster", namespace="$namespace", pod="$pod"}[$__rate_interval]))

Thanks

bmgante avatar Mar 17 '23 18:03 bmgante

FWIW this has been addressed in later versions of Rancher Server 2.6.11 along upgrading k8s to 1.24.10-rancher4-1

sirajkrm avatar Mar 27 '23 13:03 sirajkrm

When querying node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate on my prometheus instance, that does not return anything. From what exporter this metric come from ?

Same for me, did you found it finally?

Using the last helm chart of prometheus stack

anthosz avatar May 25 '23 07:05 anthosz

I'm hitting this as well. any ideas?

zuchka avatar Jul 12 '23 22:07 zuchka

same issue here

jpiazza35 avatar Jul 20 '23 15:07 jpiazza35

if you have included this in the values of prometheus :

before: kubelet: serviceMonitor: https: false

After (This works): kubelet: serviceMonitor: https: true because Kubelet is responsible for that metrics.

for me I have disabled http in service-monitor for kubelt then I research it and foud that kublelt hhtp shoulbe enabled that is http:true

DhruvPatel2647 avatar Aug 01 '23 08:08 DhruvPatel2647

I'm facing this same issue, reported here.

Removed container!="" and image!="" from prometheus-stack-kube-prom-k8s.rules.yaml worked.

gustavofbreunig avatar Sep 06 '23 14:09 gustavofbreunig

Thanks to @gustavofbreunig.

Removing all image!="" from charts/kube-prometheus-stack/templates/prometheus/rules-1.14/k8s.rules.yaml file, fixed my problem too.

This is my fork if any one wants to check it.

mohamadkhani avatar Sep 08 '23 11:09 mohamadkhani

Related issue: https://github.com/google/cadvisor/issues/3336

gustavofbreunig avatar Sep 08 '23 14:09 gustavofbreunig

It was a rancher issue, corrected on v1.24.10-rancher4-1

https://github.com/rancher/rancher/issues/38934

proceed to close the issue

gustavofbreunig avatar Sep 08 '23 15:09 gustavofbreunig

Related issue: google/cadvisor#3336

Also related details if you are on Docker-Desktop: https://github.com/docker/for-mac/issues/6969

Edit: ....or potentially just using the docker driver for minikube or Docker Desktop or really anything that involves Docker.

This issue has been automatically marked as stale because it has not had any activity in the last 60 days. Thank you for your contributions.

github-actions[bot] avatar Nov 08 '23 03:11 github-actions[bot]

This issue was closed because it has not had any activity in the last 120 days. Please reopen if you feel this is still valid.

github-actions[bot] avatar Mar 08 '24 03:03 github-actions[bot]