k3s
k3s copied to clipboard
Missing network metrics for containers
Environmental Info: K3s Version: 1.24.3+k3s1
Node(s) CPU architecture, OS, and Version: Linux 5.18.10-200.fc36.aarch64 aarch64 GNU/Linux Linux 5.18.10-200.fc36.x86_64 x86_64 GNU/Linux
Cluster Configuration: 3 servers, 1 agent
Describe the bug: I am missing some metrics in grafana, mostly related to network. For example container_network_receive_bytes_total returns nothing in prometheus and this is what I get from cadvisor:
# HELP container_network_receive_bytes_total Cumulative count of bytes received
# TYPE container_network_receive_bytes_total counter
container_network_receive_bytes_total{container="",id="/",image="",interface="cali3ff767e1f8e",name="",namespace="",pod=""} 5.04275551e+08 1658223780403
container_network_receive_bytes_total{container="",id="/",image="",interface="califa2a66cf08e",name="",namespace="",pod=""} 4.571331e+06 1658223780403
container_network_receive_bytes_total{container="",id="/",image="",interface="eth0",name="",namespace="",pod=""} 1.7452297436e+10 1658223780403
container_network_receive_bytes_total{container="",id="/",image="",interface="vxlan.calico",name="",namespace="",pod=""} 0 1658223780403
container_network_receive_bytes_total{container="",id="/",image="",interface="wlan0",name="",namespace="",pod=""} 0 1658223780403
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1340c9ff-f1e7-43df-925b-aaf9dbaab34e/7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",namespace="monitoring",pod="kube-prometheus-stack-prometheus-node-exporter-t8wmk"} 5.04275551e+08 1658223780854
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1340c9ff-f1e7-43df-925b-aaf9dbaab34e/7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",namespace="monitoring",pod="kube-prometheus-stack-prometheus-node-exporter-t8wmk"} 4.571331e+06 1658223780854
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1340c9ff-f1e7-43df-925b-aaf9dbaab34e/7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",namespace="monitoring",pod="kube-prometheus-stack-prometheus-node-exporter-t8wmk"} 1.7452299679e+10 1658223780854
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1340c9ff-f1e7-43df-925b-aaf9dbaab34e/7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",namespace="monitoring",pod="kube-prometheus-stack-prometheus-node-exporter-t8wmk"} 0 1658223780854
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1340c9ff-f1e7-43df-925b-aaf9dbaab34e/7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="7a0485fc774fd80bbc0e38cd5015af5b8827d08af0a9de1f360a3d876129a275",namespace="monitoring",pod="kube-prometheus-stack-prometheus-node-exporter-t8wmk"} 0 1658223780854
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1b8ccf1c-efbe-40f2-82e8-45b128597b32/74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",namespace="calico-system",pod="calico-node-hv78p"} 5.04272903e+08 1658223776345
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1b8ccf1c-efbe-40f2-82e8-45b128597b32/74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",namespace="calico-system",pod="calico-node-hv78p"} 4.571265e+06 1658223776345
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1b8ccf1c-efbe-40f2-82e8-45b128597b32/74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",namespace="calico-system",pod="calico-node-hv78p"} 1.7452145991e+10 1658223776345
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1b8ccf1c-efbe-40f2-82e8-45b128597b32/74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",namespace="calico-system",pod="calico-node-hv78p"} 0 1658223776345
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod1b8ccf1c-efbe-40f2-82e8-45b128597b32/74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="74d14fa656954956e3222f3c31c9532d7d4e20afc505cb4eaa3fab8f02232ba2",namespace="calico-system",pod="calico-node-hv78p"} 0 1658223776345
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod23098111-be64-43f7-9750-eab1824586dd/06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",namespace="tigera-operator",pod="tigera-operator-788998469-qqtgg"} 5.04274126e+08 1658223777321
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod23098111-be64-43f7-9750-eab1824586dd/06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",namespace="tigera-operator",pod="tigera-operator-788998469-qqtgg"} 4.571265e+06 1658223777321
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod23098111-be64-43f7-9750-eab1824586dd/06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",namespace="tigera-operator",pod="tigera-operator-788998469-qqtgg"} 1.7452206909e+10 1658223777321
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod23098111-be64-43f7-9750-eab1824586dd/06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",namespace="tigera-operator",pod="tigera-operator-788998469-qqtgg"} 0 1658223777321
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod23098111-be64-43f7-9750-eab1824586dd/06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="06298af37541be2ed561d5fbe49ab4aae2b157dd74e3feb6ad845824e9715682",namespace="tigera-operator",pod="tigera-operator-788998469-qqtgg"} 0 1658223777321
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod7ba198fd-699e-4fd2-862d-a04aeae7f9f1/98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",namespace="networking",pod="metallb-speaker-chvsb"} 5.04276026e+08 1658223781926
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod7ba198fd-699e-4fd2-862d-a04aeae7f9f1/98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",namespace="networking",pod="metallb-speaker-chvsb"} 4.571331e+06 1658223781926
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod7ba198fd-699e-4fd2-862d-a04aeae7f9f1/98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",namespace="networking",pod="metallb-speaker-chvsb"} 1.7452336643e+10 1658223781926
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod7ba198fd-699e-4fd2-862d-a04aeae7f9f1/98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",namespace="networking",pod="metallb-speaker-chvsb"} 0 1658223781926
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod7ba198fd-699e-4fd2-862d-a04aeae7f9f1/98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="98f3c12b0aec3a249be8195f943afab4bcf515f4458d814a0a10695bd0ac5576",namespace="networking",pod="metallb-speaker-chvsb"} 0 1658223781926
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod97d8f30b-adb9-45a5-98a9-eeb84d323f00/02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",namespace="calico-system",pod="calico-typha-74dbf4c44f-gh85g"} 5.04276026e+08 1658223782244
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod97d8f30b-adb9-45a5-98a9-eeb84d323f00/02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",namespace="calico-system",pod="calico-typha-74dbf4c44f-gh85g"} 4.571331e+06 1658223782244
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod97d8f30b-adb9-45a5-98a9-eeb84d323f00/02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",namespace="calico-system",pod="calico-typha-74dbf4c44f-gh85g"} 1.7452341857e+10 1658223782244
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod97d8f30b-adb9-45a5-98a9-eeb84d323f00/02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",namespace="calico-system",pod="calico-typha-74dbf4c44f-gh85g"} 0 1658223782244
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/pod97d8f30b-adb9-45a5-98a9-eeb84d323f00/02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="02c7397aedcfb3c23f94b0d34208ff9b091933c7999df0e8ab29fe718e199ea3",namespace="calico-system",pod="calico-typha-74dbf4c44f-gh85g"} 0 1658223782244
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podefb2b974-6b9d-4978-986e-301f6aae7e1e/285cbc601f0eb31b4a0cb7bdad0e8c2bdd89f138796f7ae28bff6b5287881702",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="285cbc601f0eb31b4a0cb7bdad0e8c2bdd89f138796f7ae28bff6b5287881702",namespace="kube-system",pod="local-path-provisioner-7b7dc8d6f5-dfwdn"} 8.895788e+06 1658223787942
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podf0e36fd8-f084-4665-a330-45c52d35ab81/da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",image="docker.io/rancher/mirrored-pause:3.6",interface="cali3ff767e1f8e",name="da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",namespace="kube-system",pod="kube-vip-xvp5t"} 5.04276633e+08 1658223785556
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podf0e36fd8-f084-4665-a330-45c52d35ab81/da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",image="docker.io/rancher/mirrored-pause:3.6",interface="califa2a66cf08e",name="da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",namespace="kube-system",pod="kube-vip-xvp5t"} 4.571373e+06 1658223785556
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podf0e36fd8-f084-4665-a330-45c52d35ab81/da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",namespace="kube-system",pod="kube-vip-xvp5t"} 1.745245741e+10 1658223785556
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podf0e36fd8-f084-4665-a330-45c52d35ab81/da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",image="docker.io/rancher/mirrored-pause:3.6",interface="vxlan.calico",name="da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",namespace="kube-system",pod="kube-vip-xvp5t"} 0 1658223785556
container_network_receive_bytes_total{container="",id="/kubepods/besteffort/podf0e36fd8-f084-4665-a330-45c52d35ab81/da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",image="docker.io/rancher/mirrored-pause:3.6",interface="wlan0",name="da8bc7f901a77bbd4df4b319c5944af8500b601daf27dbf1c44a5c43a7fc53b6",namespace="kube-system",pod="kube-vip-xvp5t"} 0 1658223785556
container_network_receive_bytes_total{container="",id="/kubepods/burstable/pod6cb9ed6f-029f-4b57-adf3-a4d1868fce8a/78ab43c03d103577496b8c903f49956505a73a184a3a33d1edef9787c724c096",image="docker.io/rancher/mirrored-pause:3.6",interface="eth0",name="78ab43c03d103577496b8c903f49956505a73a184a3a33d1edef9787c724c096",namespace="kube-system",pod="coredns-b96499967-4bxrf"} 3.27009329e+08 1658223781119
Steps To Reproduce:
- Installed K3s: via ansible using xanmanning.k3s role with the following config for control node:
---
# https://rancher.com/docs/k3s/latest/en/installation/install-options/server-config/
# https://github.com/PyratLabs/ansible-role-k3s
# (bool) Specify if a host (or host group) are part of the control plane
k3s_control_node: true
# (dict) k3s settings for all control-plane nodes
k3s_server:
node-ip: "{{ ansible_host }}"
tls-san:
# kube-vip
- "{{ kubevip_address }}"
# Disable Docker - this will use the default containerd CRI
docker: false
node-taint:
- "node-role.kubernetes.io/master=true:NoSchedule"
flannel-backend: "none" # This needs to be in quotes
disable:
# Disable flannel - replaced with Calico
- flannel
# Disable traefik - replaced with ingress-nginx
- traefik
# Disable servicelb - replaced with metallb and install with Flux
- servicelb
# Disable metrics-server - installed with Flux
- metrics-server
disable-network-policy: true
disable-cloud-controller: true
write-kubeconfig-mode: "644"
# Network CIDR to use for pod IPs
cluster-cidr: "10.42.0.0/16"
# Network CIDR to use for service IPs
service-cidr: "10.43.0.0/16"
kubelet-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Fix for metrics
# https://github.com/k3s-io/k3s/issues/473
- "containerd=/run/k3s/containerd/containerd.sock"
kube-controller-manager-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Required to monitor kube-controller-manager with kube-prometheus-stack
- "bind-address=0.0.0.0"
kube-proxy-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Required to monitor kube-proxy with kube-prometheus-stack
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Required to monitor kube-scheduler with kube-prometheus-stack
- "bind-address=0.0.0.0"
# Required to monitor etcd with kube-prometheus-stack
etcd-expose-metrics: true
kube-apiserver-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Required for HAProxy health-checks
- "anonymous-auth=true"
and for worker:
---
# https://rancher.com/docs/k3s/latest/en/installation/install-options/agent-config/
# https://github.com/PyratLabs/ansible-role-k3s
# (bool) Specify if a host (or host group) are part of the control plane
k3s_control_node: false
# (dict) k3s settings for all worker nodes
k3s_agent:
node-ip: "{{ ansible_host }}"
kubelet-arg:
# Allow k8s services to contain TCP and UDP on the same port
- "feature-gates=MixedProtocolLBService=true"
# Fix for metrics
# https://github.com/k3s-io/k3s/issues/473
- "containerd=/run/k3s/containerd/containerd.sock"
- Installed kube-prometheus-stack
- Run container_network_receive_bytes_total query in prometheus
Expected behavior: Expected container network metrics to return
Actual behavior: Nothing returns
Additional context / logs:
Backporting
- [ ] Needs backporting to older releases
This sounds very similar to https://github.com/k3s-io/k3s/issues/5782. Can you replicate this with the most recent release?
I actually just noticed there was a new version the second I posted the issue. I have just upgraded and so far there has been no change
Hmm. The metrics output you shared seems to show that the metrics are present and non-zero. Is there something in particular that's missing? Have you confirmed that you're scraping those metrics? Are you sure the problem is not your prometheus query?
The query I am running is just container_network_receive_bytes_total
but it seems to ignore everything that doesn't have a value for the container label. Looking at container_cpu_usage_seconds_total
in cadvisor it gives metrics with the container label both empty and with a value, and running that query in prometheus only returns the metrics with a value for container. Also I feel like there should be plenty more metrics for the network as I have quite a few pods running.
That sounds like an issue with the prometheus configuration. Perhaps something in the relabeling for pod metrics?
The default values in the kube-prometheus-stack helm chart does drop metrics with no container value
# Drop cgroup metrics with no container.
- sourceLabels: [id, container]
action: drop
regex: '.+;'
The issue seems to be the fact that the container label is empty from cadvisor and that there is no metrics at all for most of the pods running
@filipweilid think you hit nail on the head, logic for the recent relabel done in PR#2197
Overriding the prometheus relabeling so it doesn't drop the metrics does fix the problem. I looked at the cadvisor metrics from an older cluster and in that one the container label had the value "POD", in which case it makes sense to drop metrics that doesn't have a value for that label. I am not sure if the label being empty is a change made on purpose or if something is wrong. Not sure if it is a k3s thing or not and I can't spin up a k8s cluster with kubeadm to test either
This isn't a k3s issue as I've encountered the same issue on a AKS and GKE cluster, so I've opened a PR on kube-prometheus-stack to remove the relabeller config which is causing this issue https://github.com/prometheus-community/helm-charts/pull/2297
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Do we have any update on this issue ?
Sorry to bump an old/closed issue. The change that was made to the upstream kube-prometheus-stack is incorrect. We do want to drop container=""
metrics.
On one of my normal clusters the metrics like container_network_receive_bytes_total
contain the label container="POD"
. This is with CRI-O, not containerd, not sure if this makes a difference.
So the real issue here is that the POD
labeling of metrics is missing from k3s cAdvisor somehow. What @filipweilid is the correct.