prometheus-k8s-operator
prometheus-k8s-operator copied to clipboard
Juju topology labels break prometheus rules
Bug Description
Some prometheus rules added by kubernetes-control-plane charm are broken due to the labels injected by COS. For example this rule:
sum by (cluster, namespace, pod, container) (irate(container_cpu_usage_seconds_total
{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])) * on (cluster, namespace, pod) group_left(node)
topk by (cluster, namespace, pod) (1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""}))
gets renderered like this
sum by (cluster, namespace, pod, container) (irate(container_cpu_usage_seconds_total
{image!="",job="kubelet",juju_application="kubernetes-control-plane",juju_model="k8s",
juju_model_uuid="e61a4a4d-a037-4cd1-88df-6ca15aa46f7b",metrics_path="/metrics/cadvisor"}[5m])) * on (cluster, namespace, pod)
group_left (node) topk by (cluster, namespace, pod) (1, max by (cluster, namespace, pod, node)
(kube_pod_info{juju_application="kubernetes-control-plane",juju_model="k8s",
juju_model_uuid="e61a4a4da037-4cd1-88df-6ca15aa46f7b",node!=""}))
Where the labels juju_application, juju_model and juju_model_uuid are injected by COS. In this case, this makes the rule apply only for pods running in the control plane node. Thus Grafana is missing data for pods running on worker nodes (most of them).
Also, if we were to copy this rule in the kubernetes-worker charm it would not work either because it would be filtered by juju_application="kubernetes-worker" for both metrics and that would break the kube_pod_info metric because that is scraped by the control plane only.
To Reproduce
To reproduce just deploy a regular charmed kubernetes installation and observe it with cos. You will notice information missing for pods not running on the control plane in the Kubernetes / Compute Resources / Pod and Kubernetes / Compute Resources / Node (Pods) dashboards for example.
Environment
I tested with latest/stable cos charms running over microk8s 1.28/stable. The observed kubernetes cloud is using kubernetes 1.31.5.
Relevant log output
There is no logs to show. There are no errors, just missing data.
Additional context
No response
COS bundle
bundle: kubernetes
saas:
remote-b5f775f74cf040ec8ba9f05ed7057a5b: {}
applications:
alertmanager:
charm: alertmanager-k8s
channel: latest/stable
revision: 128
resources:
alertmanager-image: 95
scale: 1
constraints: arch=amd64
storage:
data: kubernetes,1,1024M
trust: true
catalogue:
charm: catalogue-k8s
channel: latest/stable
revision: 59
resources:
catalogue-image: 33
scale: 1
options:
description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight,
highly-integrated, \nJuju-based observability suite running on Kubernetes.\n"
tagline: Model-driven Observability Stack deployed with a single command.
title: Canonical Observability Stack
constraints: arch=amd64
trust: true
grafana:
charm: grafana-k8s
channel: latest/stable
revision: 117
resources:
grafana-image: 69
litestream-image: 44
scale: 1
constraints: arch=amd64
storage:
database: kubernetes,1,1024M
trust: true
loki:
charm: loki-k8s
channel: latest/stable
revision: 161
resources:
loki-image: 99
node-exporter-image: 2
scale: 1
constraints: arch=amd64
storage:
active-index-directory: kubernetes,1,1024M
loki-chunks: kubernetes,1,1024M
trust: true
prometheus:
charm: prometheus-k8s
channel: latest/stable
revision: 210
resources:
prometheus-image: 149
scale: 1
constraints: arch=amd64
storage:
database: kubernetes,1,1024M
trust: true
traefik:
charm: traefik-k8s
channel: latest/stable
revision: 203
resources:
traefik-image: 160
scale: 1
constraints: arch=amd64
storage:
configurations: kubernetes,1,1024M
trust: true
relations:
- - traefik:ingress-per-unit
- prometheus:ingress
- - traefik:ingress-per-unit
- loki:ingress
- - traefik:traefik-route
- grafana:ingress
- - traefik:ingress
- alertmanager:ingress
- - prometheus:alertmanager
- alertmanager:alerting
- - grafana:grafana-source
- prometheus:grafana-source
- - grafana:grafana-source
- loki:grafana-source
- - grafana:grafana-source
- alertmanager:grafana-source
- - loki:alertmanager
- alertmanager:alerting
- - prometheus:metrics-endpoint
- traefik:metrics-endpoint
- - prometheus:metrics-endpoint
- alertmanager:self-metrics-endpoint
- - prometheus:metrics-endpoint
- loki:metrics-endpoint
- - prometheus:metrics-endpoint
- grafana:metrics-endpoint
- - grafana:grafana-dashboard
- loki:grafana-dashboard
- - grafana:grafana-dashboard
- prometheus:grafana-dashboard
- - grafana:grafana-dashboard
- alertmanager:grafana-dashboard
- - catalogue:ingress
- traefik:ingress
- - catalogue:catalogue
- grafana:catalogue
- - catalogue:catalogue
- prometheus:catalogue
- - catalogue:catalogue
- alertmanager:catalogue
- - grafana:grafana-dashboard
- remote-b5f775f74cf040ec8ba9f05ed7057a5b:grafana-dashboards-provider
- - prometheus:receive-remote-write
- remote-b5f775f74cf040ec8ba9f05ed7057a5b:send-remote-write
--- # overlay.yaml
applications:
grafana:
offers:
grafana:
endpoints:
- grafana-dashboard
acl:
admin: admin
prometheus:
offers:
prometheus:
endpoints:
- metrics-endpoint
- receive-remote-write
acl:
admin: admin
K8s bundle:
default-base: [email protected]/stable
saas:
grafana:
url: cos-k8s-controller-2:admin/cos.grafana
prometheus:
url: cos-k8s-controller-2:admin/cos.prometheus
applications:
containerd:
charm: containerd
channel: latest/stable
revision: 82
resources:
containerd: 2
options:
http_proxy: http://squid.internal:3128
https_proxy: http://squid.internal:3128
no_proxy: 127.0.0.1,localhost,::1,10.149.0.0/16,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
easyrsa:
charm: easyrsa
channel: latest/stable
revision: 63
resources:
easyrsa: 2
num_units: 1
to:
- "0"
constraints: arch=amd64
etcd:
charm: etcd
channel: latest/stable
revision: 770
resources:
core: 0
etcd: 3
snapshot: 0
num_units: 1
to:
- "1"
options:
channel: latest/stable
constraints: arch=amd64
storage:
data: loop,1024M
flannel:
charm: flannel
channel: latest/stable
revision: 87
resources:
flannel-amd64: 83
flannel-arm64: 83
flannel-s390x: 83
grafana-agent:
charm: grafana-agent
channel: latest/beta
revision: 365
options:
path_exclude: /var/log/libvirt/qemu/**
kubeapi-load-balancer:
charm: kubeapi-load-balancer
channel: latest/stable
revision: 163
resources:
nginx-prometheus-exporter: 1
num_units: 1
to:
- "2"
expose: true
constraints: arch=amd64 mem=1024
kubernetes-control-plane:
charm: local:kubernetes-control-plane-0
resources:
cni-plugins: 1
num_units: 1
to:
- "3"
expose: true
constraints: arch=amd64 mem=2048
kubernetes-worker:
charm: kubernetes-worker
channel: latest/stable
revision: 265
resources:
cni-plugins: 1
num_units: 2
to:
- "4"
- "5"
expose: true
constraints: arch=amd64 mem=4096
openstack-cloud-controller:
charm: openstack-cloud-controller
channel: latest/stable
revision: 15
openstack-integrator:
charm: openstack-integrator
channel: latest/stable
revision: 86
resources:
openstackclients: 1
num_units: 1
to:
- "6"
constraints: arch=amd64
trust: true
machines:
"0":
constraints: arch=amd64 root-disk=20480 root-disk-source=volume
"1":
constraints: arch=amd64 root-disk=20480 root-disk-source=volume
"2":
constraints: arch=amd64 mem=1024 root-disk=20480 root-disk-source=volume
"3":
constraints: arch=amd64 mem=2048 root-disk=20480 root-disk-source=volume
"4":
constraints: arch=amd64 mem=4096 root-disk=20480 root-disk-source=volume
"5":
constraints: arch=amd64 mem=4096 root-disk=20480 root-disk-source=volume
"6":
constraints: arch=amd64 root-disk=20480 root-disk-source=volume
relations:
- - kubernetes-control-plane:kube-control
- kubernetes-worker:kube-control
- - etcd:certificates
- easyrsa:client
- - kubernetes-control-plane:etcd
- etcd:db
- - kubernetes-control-plane:loadbalancer-external
- kubeapi-load-balancer:lb-consumers
- - kubernetes-control-plane:loadbalancer-internal
- kubeapi-load-balancer:lb-consumers
- - flannel:cni
- kubernetes-control-plane:cni
- - flannel:cni
- kubernetes-worker:cni
- - flannel:etcd
- etcd:db
- - kubernetes-control-plane:certificates
- easyrsa:client
- - kubernetes-worker:certificates
- easyrsa:client
- - kubeapi-load-balancer:certificates
- easyrsa:client
- - openstack-integrator:clients
- openstack-cloud-controller:openstack
- - openstack-cloud-controller:certificates
- easyrsa:client
- - openstack-cloud-controller:kube-control
- kubernetes-control-plane:kube-control
- - openstack-cloud-controller:external-cloud-provider
- kubernetes-control-plane:external-cloud-provider
- - containerd:containerd
- kubernetes-worker:container-runtime
- - containerd:containerd
- kubernetes-control-plane:container-runtime
- - grafana-agent:cos-agent
- kubernetes-control-plane:cos-agent
- - grafana-agent:grafana-dashboards-provider
- grafana:grafana-dashboard
- - grafana-agent:send-remote-write
- prometheus:receive-remote-write
- - kubernetes-worker:tokens
- kubernetes-control-plane:tokens
- - kubernetes-worker:cos-agent
- grafana-agent:cos-agent
We should investigate why kube_pod_info{juju_application="kubernetes-control-plane"} returns no (or partial) results.
Perhaps something along the lines of:
kube_pod_infois a metric generated bykubernetes-control-plane; andkube_pod_infoactually applies tokuberentes-worker, a different charm, which is identified by other labels (not juju topology), generated by the scrape job.
We may need to come up with a juju-topology solution to when rules provided by one central charm (control plane) need to apply to other charms (worker).
The container_cpu_usage_seconds_total metric is available on the control plane's /metrics/cadvisor/ endpoint and it has information about cpu usage of the workers. We need to find out what labels they come with, and what relabel config is in place already.
At the same time, rules for the workers come from the control plane charm.
Example:
container_cpu_usage_seconds_total{cluster="kubernetes-g0tfcg2djrtd2rasmnhvxdqvkedhtdgu", cpu="total", id="/kubepods/besteffort/podc2ee0a15-3028-48f7-a7e3-718401a1de10", instance="juju-ae6d9e-observed-k8s-5", job="kubelet", juju_application="kubernetes-worker", juju_model="observed-k8s", juju_model_uuid="c569c975-133e-4748-8fd5-19c9c0ae6d9e", juju_unit="kubernetes-worker/1", metrics_path="/metrics/cadvisor", namespace="default", node="juju-ae6d9e-observed-k8s-5", pod="nginx-5869d7778c-p5n7d"}
Reproducer:
juju deploy k8s(vm model) (see bundle above, but use 1.32/stable).- Relate to gagent.
graph LR
charm-kubernetes-control-plane --- worker
charm-kubernetes-control-plane --- grafana-agent ---|CMR| prometheus
We should investigate if a relabeling config in the scrape job definition could help with changing the juju_unit from control plane to worker.
Desired situation:
- In grafana we should see per-worker cpu usage (from
container_cpu_usage_seconds_total).
Doing some more research I found that in fact the /metrics/cadvisor can be queried in worked nodes too. See for example this snippet from /etc/grafana-agent.yaml
authorization:
credentials: kubernetes-worker/1::n9...
job_name: kubernetes-worker_3_kubelet-adviso
metrics_path: /metrics/cadvisor
relabel_configs:
- replacement: /metrics/cadvisor
target_label: metrics_path
- replacement: kubelet
target_label: job
- replacement: juju-ab090a-observed-k8s-5
source_labels:
- instance
target_label: instance
scheme: https
static_configs:
- labels:
cluster: kubernetes-tgdzbqhnkiwxfg0ypde7fegbotkepzpv
juju_application: kubernetes-worker
juju_model: observed-k8s
juju_model_uuid: 6925682f-13b4-4731-8e3c-0d73ddab090a
juju_unit: kubernetes-worker/1
node: juju-ab090a-observed-k8s-5
targets:
- localhost:10250
tls_config:
insecure_skip_verify: true
So this is where the metrics for the worker are coming. I'm starting to think this might be related to the tokens relation between the control plane and the worker node and not with COS.
Adding the tokens relations fixed this. Thank you for your time.