prometheus-k8s-operator Juju topology labels break prometheus rules

Bug Description

Some prometheus rules added by kubernetes-control-plane charm are broken due to the labels injected by COS. For example this rule:

sum by (cluster, namespace, pod, container) (irate(container_cpu_usage_seconds_total
{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])) * on (cluster, namespace, pod) group_left(node) 
topk by (cluster, namespace, pod) (1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""}))

gets renderered like this

sum by (cluster, namespace, pod, container) (irate(container_cpu_usage_seconds_total
{image!="",job="kubelet",juju_application="kubernetes-control-plane",juju_model="k8s",
juju_model_uuid="e61a4a4d-a037-4cd1-88df-6ca15aa46f7b",metrics_path="/metrics/cadvisor"}[5m])) * on (cluster, namespace, pod)
group_left (node) topk by (cluster, namespace, pod) (1, max by (cluster, namespace, pod, node) 
(kube_pod_info{juju_application="kubernetes-control-plane",juju_model="k8s",
juju_model_uuid="e61a4a4da037-4cd1-88df-6ca15aa46f7b",node!=""}))

Where the labels juju_application, juju_model and juju_model_uuid are injected by COS. In this case, this makes the rule apply only for pods running in the control plane node. Thus Grafana is missing data for pods running on worker nodes (most of them). Also, if we were to copy this rule in the kubernetes-worker charm it would not work either because it would be filtered by juju_application="kubernetes-worker" for both metrics and that would break the kube_pod_info metric because that is scraped by the control plane only.

To Reproduce

To reproduce just deploy a regular charmed kubernetes installation and observe it with cos. You will notice information missing for pods not running on the control plane in the Kubernetes / Compute Resources / Pod and Kubernetes / Compute Resources / Node (Pods) dashboards for example.

Environment

I tested with latest/stable cos charms running over microk8s 1.28/stable. The observed kubernetes cloud is using kubernetes 1.31.5.

Relevant log output

There is no logs to show. There are no errors, just missing data.

Additional context

No response

Jan 18 '25 23:01 drencrom

COS bundle

bundle: kubernetes
saas:
  remote-b5f775f74cf040ec8ba9f05ed7057a5b: {}
applications:
  alertmanager:
    charm: alertmanager-k8s
    channel: latest/stable
    revision: 128
    resources:
      alertmanager-image: 95
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    trust: true
  catalogue:
    charm: catalogue-k8s
    channel: latest/stable
    revision: 59
    resources:
      catalogue-image: 33
    scale: 1
    options:
      description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight,
        highly-integrated, \nJuju-based observability suite running on Kubernetes.\n"
      tagline: Model-driven Observability Stack deployed with a single command.
      title: Canonical Observability Stack
    constraints: arch=amd64
    trust: true
  grafana:
    charm: grafana-k8s
    channel: latest/stable
    revision: 117
    resources:
      grafana-image: 69
      litestream-image: 44
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  loki:
    charm: loki-k8s
    channel: latest/stable
    revision: 161
    resources:
      loki-image: 99
      node-exporter-image: 2
    scale: 1
    constraints: arch=amd64
    storage:
      active-index-directory: kubernetes,1,1024M
      loki-chunks: kubernetes,1,1024M
    trust: true
  prometheus:
    charm: prometheus-k8s
    channel: latest/stable
    revision: 210
    resources:
      prometheus-image: 149
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  traefik:
    charm: traefik-k8s
    channel: latest/stable
    revision: 203
    resources:
      traefik-image: 160
    scale: 1
    constraints: arch=amd64
    storage:
      configurations: kubernetes,1,1024M
    trust: true
relations:
- - traefik:ingress-per-unit
  - prometheus:ingress
- - traefik:ingress-per-unit
  - loki:ingress
- - traefik:traefik-route
  - grafana:ingress
- - traefik:ingress
  - alertmanager:ingress
- - prometheus:alertmanager
  - alertmanager:alerting
- - grafana:grafana-source
  - prometheus:grafana-source
- - grafana:grafana-source
  - loki:grafana-source
- - grafana:grafana-source
  - alertmanager:grafana-source
- - loki:alertmanager
  - alertmanager:alerting
- - prometheus:metrics-endpoint
  - traefik:metrics-endpoint
- - prometheus:metrics-endpoint
  - alertmanager:self-metrics-endpoint
- - prometheus:metrics-endpoint
  - loki:metrics-endpoint
- - prometheus:metrics-endpoint
  - grafana:metrics-endpoint
- - grafana:grafana-dashboard
  - loki:grafana-dashboard
- - grafana:grafana-dashboard
  - prometheus:grafana-dashboard
- - grafana:grafana-dashboard
  - alertmanager:grafana-dashboard
- - catalogue:ingress
  - traefik:ingress
- - catalogue:catalogue
  - grafana:catalogue
- - catalogue:catalogue
  - prometheus:catalogue
- - catalogue:catalogue
  - alertmanager:catalogue
- - grafana:grafana-dashboard
  - remote-b5f775f74cf040ec8ba9f05ed7057a5b:grafana-dashboards-provider
- - prometheus:receive-remote-write
  - remote-b5f775f74cf040ec8ba9f05ed7057a5b:send-remote-write
--- # overlay.yaml
applications:
  grafana:
    offers:
      grafana:
        endpoints:
        - grafana-dashboard
        acl:
          admin: admin
  prometheus:
    offers:
      prometheus:
        endpoints:
        - metrics-endpoint
        - receive-remote-write
        acl:
          admin: admin

Jan 24 '25 16:01 drencrom

K8s bundle:

default-base: [email protected]/stable
saas:
  grafana:
    url: cos-k8s-controller-2:admin/cos.grafana
  prometheus:
    url: cos-k8s-controller-2:admin/cos.prometheus
applications:
  containerd:
    charm: containerd
    channel: latest/stable
    revision: 82
    resources:
      containerd: 2
    options:
      http_proxy: http://squid.internal:3128
      https_proxy: http://squid.internal:3128
      no_proxy: 127.0.0.1,localhost,::1,10.149.0.0/16,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
  easyrsa:
    charm: easyrsa
    channel: latest/stable
    revision: 63
    resources:
      easyrsa: 2
    num_units: 1
    to:
    - "0"
    constraints: arch=amd64
  etcd:
    charm: etcd
    channel: latest/stable
    revision: 770
    resources:
      core: 0
      etcd: 3
      snapshot: 0
    num_units: 1
    to:
    - "1"
    options:
      channel: latest/stable
    constraints: arch=amd64
    storage:
      data: loop,1024M
  flannel:
    charm: flannel
    channel: latest/stable
    revision: 87
    resources:
      flannel-amd64: 83
      flannel-arm64: 83
      flannel-s390x: 83
  grafana-agent:
    charm: grafana-agent
    channel: latest/beta
    revision: 365
    options:
      path_exclude: /var/log/libvirt/qemu/**
  kubeapi-load-balancer:
    charm: kubeapi-load-balancer
    channel: latest/stable
    revision: 163
    resources:
      nginx-prometheus-exporter: 1
    num_units: 1
    to:
    - "2"
    expose: true
    constraints: arch=amd64 mem=1024
  kubernetes-control-plane:
    charm: local:kubernetes-control-plane-0
    resources:
      cni-plugins: 1
    num_units: 1
    to:
    - "3"
    expose: true
    constraints: arch=amd64 mem=2048
  kubernetes-worker:
    charm: kubernetes-worker
    channel: latest/stable
    revision: 265
    resources:
      cni-plugins: 1
    num_units: 2
    to:
    - "4"
    - "5"
    expose: true
    constraints: arch=amd64 mem=4096
  openstack-cloud-controller:
    charm: openstack-cloud-controller
    channel: latest/stable
    revision: 15
  openstack-integrator:
    charm: openstack-integrator
    channel: latest/stable
    revision: 86
    resources:
      openstackclients: 1
    num_units: 1
    to:
    - "6"
    constraints: arch=amd64
    trust: true
machines:
  "0":
    constraints: arch=amd64 root-disk=20480 root-disk-source=volume
  "1":
    constraints: arch=amd64 root-disk=20480 root-disk-source=volume
  "2":
    constraints: arch=amd64 mem=1024 root-disk=20480 root-disk-source=volume
  "3":
    constraints: arch=amd64 mem=2048 root-disk=20480 root-disk-source=volume
  "4":
    constraints: arch=amd64 mem=4096 root-disk=20480 root-disk-source=volume
  "5":
    constraints: arch=amd64 mem=4096 root-disk=20480 root-disk-source=volume
  "6":
    constraints: arch=amd64 root-disk=20480 root-disk-source=volume
relations:
- - kubernetes-control-plane:kube-control
  - kubernetes-worker:kube-control
- - etcd:certificates
  - easyrsa:client
- - kubernetes-control-plane:etcd
  - etcd:db
- - kubernetes-control-plane:loadbalancer-external
  - kubeapi-load-balancer:lb-consumers
- - kubernetes-control-plane:loadbalancer-internal
  - kubeapi-load-balancer:lb-consumers
- - flannel:cni
  - kubernetes-control-plane:cni
- - flannel:cni
  - kubernetes-worker:cni
- - flannel:etcd
  - etcd:db
- - kubernetes-control-plane:certificates
  - easyrsa:client
- - kubernetes-worker:certificates
  - easyrsa:client
- - kubeapi-load-balancer:certificates
  - easyrsa:client
- - openstack-integrator:clients
  - openstack-cloud-controller:openstack
- - openstack-cloud-controller:certificates
  - easyrsa:client
- - openstack-cloud-controller:kube-control
  - kubernetes-control-plane:kube-control
- - openstack-cloud-controller:external-cloud-provider
  - kubernetes-control-plane:external-cloud-provider
- - containerd:containerd
  - kubernetes-worker:container-runtime
- - containerd:containerd
  - kubernetes-control-plane:container-runtime
- - grafana-agent:cos-agent
  - kubernetes-control-plane:cos-agent
- - grafana-agent:grafana-dashboards-provider
  - grafana:grafana-dashboard
- - grafana-agent:send-remote-write
  - prometheus:receive-remote-write
- - kubernetes-worker:tokens
  - kubernetes-control-plane:tokens
- - kubernetes-worker:cos-agent
  - grafana-agent:cos-agent

Jan 24 '25 16:01 drencrom

We should investigate why kube_pod_info{juju_application="kubernetes-control-plane"} returns no (or partial) results.

Perhaps something along the lines of:

kube_pod_info is a metric generated by kubernetes-control-plane; and
kube_pod_info actually applies to kuberentes-worker, a different charm, which is identified by other labels (not juju topology), generated by the scrape job.

We may need to come up with a juju-topology solution to when rules provided by one central charm (control plane) need to apply to other charms (worker).

Feb 07 '25 13:02 sed-i

The container_cpu_usage_seconds_total metric is available on the control plane's /metrics/cadvisor/ endpoint and it has information about cpu usage of the workers. We need to find out what labels they come with, and what relabel config is in place already.

At the same time, rules for the workers come from the control plane charm.

Example:

container_cpu_usage_seconds_total{cluster="kubernetes-g0tfcg2djrtd2rasmnhvxdqvkedhtdgu", cpu="total", id="/kubepods/besteffort/podc2ee0a15-3028-48f7-a7e3-718401a1de10", instance="juju-ae6d9e-observed-k8s-5", job="kubelet", juju_application="kubernetes-worker", juju_model="observed-k8s", juju_model_uuid="c569c975-133e-4748-8fd5-19c9c0ae6d9e", juju_unit="kubernetes-worker/1", metrics_path="/metrics/cadvisor", namespace="default", node="juju-ae6d9e-observed-k8s-5", pod="nginx-5869d7778c-p5n7d"}

Reproducer:

juju deploy k8s (vm model) (see bundle above, but use 1.32/stable).
Relate to gagent.

graph LR
charm-kubernetes-control-plane --- worker
charm-kubernetes-control-plane --- grafana-agent ---|CMR| prometheus

We should investigate if a relabeling config in the scrape job definition could help with changing the juju_unit from control plane to worker.

Desired situation:

In grafana we should see per-worker cpu usage (from container_cpu_usage_seconds_total).

Jun 13 '25 15:06 sed-i

Doing some more research I found that in fact the /metrics/cadvisor can be queried in worked nodes too. See for example this snippet from /etc/grafana-agent.yaml

authorization:
        credentials: kubernetes-worker/1::n9...
      job_name: kubernetes-worker_3_kubelet-adviso
      metrics_path: /metrics/cadvisor
      relabel_configs:
      - replacement: /metrics/cadvisor
        target_label: metrics_path
      - replacement: kubelet
        target_label: job
      - replacement: juju-ab090a-observed-k8s-5
        source_labels:
        - instance
        target_label: instance
      scheme: https
      static_configs:
      - labels:
          cluster: kubernetes-tgdzbqhnkiwxfg0ypde7fegbotkepzpv
          juju_application: kubernetes-worker
          juju_model: observed-k8s
          juju_model_uuid: 6925682f-13b4-4731-8e3c-0d73ddab090a
          juju_unit: kubernetes-worker/1
          node: juju-ab090a-observed-k8s-5
        targets:
        - localhost:10250
      tls_config:
        insecure_skip_verify: true

So this is where the metrics for the worker are coming. I'm starting to think this might be related to the tokens relation between the control plane and the worker node and not with COS.

Jun 13 '25 22:06 drencrom

Adding the tokens relations fixed this. Thank you for your time.

Jun 27 '25 14:06 drencrom

prometheus-k8s-operator prometheus-k8s-operator copied to clipboard

Juju topology labels break prometheus rules

Bug Description

To Reproduce

Environment

Relevant log output

Additional context

prometheus-k8s-operator
prometheus-k8s-operator copied to clipboard