grafana-dashboards-kubernetes Issues with node_cpu_seconds

I tested the latest changes, and still not right...

Panel CPU Utilization by Node "expr": "avg by (node) (1-rate(node_cpu_seconds_total{mode=\"idle\"}[$__rate_interval]))",yields:

Seems to be the total of all nodes? It is not picking up the multiple nodes, It should look like:

Panel CPU Utilization by namespace is still dark and using old metric: "expr": "sum(rate(container_cpu_usage_seconds_total{image!=\"\"}[$__rate_interval])) by (namespace)", I did try something like above "avg by (namespace) (1-rate(node_cpu_seconds_total{mode=\"idle\"}[$__rate_interval]))" that is not right, only got one namespace listed:

Both Memory Utilization Panels are still dark based on container_memory_working_set_bytes when I use your unmodified files.

Jul 18 '22 12:07 reefland

Kicking around some other ideas and learned how to merge two metrics which share a common key.

"expr": "avg by (nodename) (instance:node_cpu:ratio * on(instance) group_left(nodename) node_uname_info)"

Jul 18 '22 23:07 reefland

Hi @reefland, Yes I noticed that too... will try to fix this today and let you know when everything works with 37.*.

Jul 19 '22 07:07 dotdc

Can you try with the latest version? Commit : https://github.com/dotdc/grafana-dashboards-kubernetes/commit/132a29652829acb9927db10fff979398d4ac56ee

Jul 19 '22 09:07 dotdc

I think the "REAL" RAM usage is now including all kinds of cached memory as "used". Previous method is reporting 12.9GB RAM used, new method is reporting 39.8GB RAM used (I have both running side by side). It's not a measurement of what Kubernetes is using anymore.

According to RedHat https://access.redhat.com/solutions/406773 (Covers most of the node_memory_* counters)

Mem: used = MemTotal - MemFree - Buffers - Cached - Slab

So I tried:

sum(node_memory_MemTotal_bytes - (node_memory_Buffers_bytes + node_memory_Cached_bytes + node_memory_MemFree_bytes + node_memory_Slab_bytes))

This dropped it a bit to 34.3 GB.

For me ~23.6 GB usage is ZFS Arc cache which ZFS will return to OS if any memory pressure happens. Temporary used, but still used I guess.

The CPU Utilization by node should probably be renamed to by instance now.
The CPU Utilization by namespace is still dark, not updated yet.

Jul 19 '22 18:07 reefland

I also noticed a difference with the previous method that didn't included everything. I think it make sense to have "Real" match the system resources usage. Otherwise you could see available resources on the dashboards but the scheduler could fail to start pods due to the lack of resources behind the scene.

With the new method, the REAL value should match the system /proc. I compared metrics with free/top on my side and everything looked good. I'll try to test on a cluster with more load to see if I need to exclude cache.

For your ZFS Arc cache, maybe is not seen as cache by the system and I'm not sure how to get around this. Do you have a way to test Memory Usage panel from Views / Nodes?

CPU Utilization by namespace should work, is it still a label issue with k3s?

Jul 20 '22 06:07 dotdc

I think you have the memory right. Checked other tools like LENS and it lines right up.

The CPU & Memory by namespace, Memory Utilization by node are dark as I do not have metrics with a image= attribute. I wonder if I can set it with a relabel but not sure what the image value is supposed to be, is it a docker image reference? I manually change image= to pod= and the panel seems to work fine.

I also upgraded to Kube Stack Prometheus 38.0.1 which has this:

     # Drop cgroup metrics with no pod.
      - sourceLabels: [id, pod]
        action: drop
        regex: '.+;'

Which caused container_network_receive_bytes_total and container_network_transmit_bytes_total to get dropped. Noticed after a few hours I lost all my network stats since upgrading. I removed that drop and my network stats came back.

Jul 25 '22 03:07 reefland

Hi @reefland,

Yes the image label contains the name of the image with the tag or sha256.

Example: k8s.gcr.io/kube-state-metrics/kube-state-metrics@sha256:0ccff0db0a342d264c8f4fe5a0841d727fbb8e6cc0c7e741441771f165262182

I checked before and this label is available by default on cAdvisor, but maybe it gets dropped at some point to reduce entropy on your setup. Did you try a recursive grep or else to find a rule that would drop this image label? Sorry I never took the time to try using k3s, I'll try but missing time lately 😅

Do you still have an issue with container_network_receive_bytes_total and container_network_transmit_bytes_total? I think they did a rollback on this one, just checked with 39.5.0 and looked good on my side.

Let me know

Aug 09 '22 08:08 dotdc

I still can't use your dashboards unmodified:

My networking metrics are looking good again. But I think my overrides are still in place...

I've not been able to find a rule that would drop the image label. Honestly, I'm not even sure where to look. I assume I can use Prometheus configuration screen, prometheus-kubelet job?

Nothing stands out:

- job_name: serviceMonitor/monitoring/prometheus-kubelet/1
  honor_labels: true
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  follow_redirects: true
  enable_http2: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
    separator: ;
    regex: (kubelet);true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
    separator: ;
    regex: (kubelet);true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https-metrics
    action: replace
  - source_labels: [__metrics_path__]
    separator: ;
    regex: (.*)
    target_label: metrics_path
    replacement: $1
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  metric_relabel_configs:
  - source_labels: [__name__]
    separator: ;
    regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_memory_(mapped_file|swap)
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_(file_descriptors|tasks_state|threads_max)
    replacement: $1
    action: drop
  - source_labels: [__name__]
    separator: ;
    regex: container_spec.*
    replacement: $1
    action: drop
  - source_labels: [node]
    separator: ;
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace
  kubernetes_sd_configs:
  - role: endpoints
    kubeconfig_file: ""
    follow_redirects: true
    enable_http2: true
    namespaces:
      own_namespace: false
      names:
      - kube-system

Aug 09 '22 15:08 reefland

Ok we'll use this issue to understand why your k3s setup is missing both the image and the container labels from cAdvisor.

Just installed k3s on my laptop and got everything working out of the box using the default configuration.

OS: archlinux

k3s version:

david@laptop ~ $ sudo k3s kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k3s1", GitCommit:"313aaca547f030752788dce696fdf8c9568bc035", GitTreeState:"clean", BuildDate:"2022-03-31T01:02:40Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k3s1", GitCommit:"313aaca547f030752788dce696fdf8c9568bc035", GitTreeState:"clean", BuildDate:"2022-03-31T01:02:40Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

kube-prometheus-stack version:

david@laptop ~ $ helm ls -n monitoring
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION
david@laptop ~ $ helm ls -n monitoring
NAME                    NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                           APP VERSION
kube-prometheus-stack   monitoring      1               2022-08-17 15:13:29.783638314 +0200 CEST        deployed        kube-prometheus-stack-39.8.0    0.58.0

Everything is running:

david@laptop ~ $ sudo k3s kubectl get pod -n monitoring
NAME                                                        READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-prometheus-node-exporter-qgzwl        1/1     Running   0          19m
kube-prometheus-stack-operator-5995b5478d-vlrss             1/1     Running   0          19m
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          19m
kube-prometheus-stack-kube-state-metrics-5f6d6c64d5-tpnmt   1/1     Running   0          19m
kube-prometheus-stack-grafana-c9569b849-28zxf               3/3     Running   0          19m
prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0          19m

All dashboards are working:

And I also have the image and the container labels:

Can you try to deploy a copy of your setup with no customization at all (no values, no parameters etc...) Just raw k3s with the latest version of kube-prometheus-stack and the latest dashboards.

Also, can you please share again as much information as possible on your setup: Hardware (and/or Hypervisor with version), processor architecture, guest OS, k8s version, k3s version, kube-prometheus-stack version etc...

I'll do my best to get as close as possible to your setup to figure this out.

Let me know,

David

Aug 17 '22 14:08 dotdc

I think the key difference is that I'm unable to run the containerd that is built into the k3s binary as I need to use the ZFS snapshotter. The containerd they use is very lightweight version and the overlay FS they use will not work on ZFS, the K3s service can't even start. So I can't use the default install.

If you can try to install the standard containerd yourself:

$ sudo apt install containerd containernetworking-plugins iptables

$ containerd --version

containerd github.com/containerd/containerd 1.5.9-0ubuntu3

$ sudo crictl version

Version:  0.1.0
RuntimeName:  containerd
RuntimeVersion:  1.5.9-0ubuntu3
RuntimeApiVersion:  v1alpha2

Generate a default config file:

$ sudo -i
# containerd config default > /etc/containerd/config.toml

You should then be able to use the new containerd installed:

$ sudo ctr --address /run/containerd/containerd.sock plugins ls

TYPE                            ID                       PLATFORMS      STATUS    
io.containerd.content.v1        content                  -              ok        
io.containerd.snapshotter.v1    aufs                     linux/amd64    skip      
io.containerd.snapshotter.v1    btrfs                    linux/amd64    skip      
io.containerd.snapshotter.v1    devmapper                linux/amd64    error     
io.containerd.snapshotter.v1    native                   linux/amd64    ok        
io.containerd.snapshotter.v1    overlayfs                linux/amd64    ok        
io.containerd.snapshotter.v1    zfs                      linux/amd64    ok        
io.containerd.metadata.v1       bolt                     -              ok        
io.containerd.differ.v1         walking                  linux/amd64    ok        
io.containerd.gc.v1             scheduler                -              ok        
io.containerd.service.v1        introspection-service    -              ok        
io.containerd.service.v1        containers-service       -              ok        
io.containerd.service.v1        content-service          -              ok        
io.containerd.service.v1        diff-service             -              ok        
io.containerd.service.v1        images-service           -              ok        
io.containerd.service.v1        leases-service           -              ok        
io.containerd.service.v1        namespaces-service       -              ok        
io.containerd.service.v1        snapshots-service        -              ok        
io.containerd.runtime.v1        linux                    linux/amd64    ok        
io.containerd.runtime.v2        task                     linux/amd64    ok        
io.containerd.monitor.v1        cgroups                  linux/amd64    ok        
io.containerd.service.v1        tasks-service            -              ok        
io.containerd.internal.v1       restart                  -              ok        
io.containerd.grpc.v1           containers               -              ok        
io.containerd.grpc.v1           content                  -              ok        
io.containerd.grpc.v1           diff                     -              ok        
io.containerd.grpc.v1           events                   -              ok        
io.containerd.grpc.v1           healthcheck              -              ok        
io.containerd.grpc.v1           images                   -              ok        
io.containerd.grpc.v1           leases                   -              ok        
io.containerd.grpc.v1           namespaces               -              ok        
io.containerd.internal.v1       opt                      -              ok        
io.containerd.grpc.v1           snapshots                -              ok        
io.containerd.grpc.v1           tasks                    -              ok        
io.containerd.grpc.v1           version                  -              ok        
io.containerd.grpc.v1           cri                      linux/amd64    ok

And then point to that containerd by updating the k3s service, just add this to list of parameters:

--container-runtime-endpoint unix:///run/containerd/containerd.sock

Most of this you don't need, but for completeness:

ExecStart=/usr/local/bin/k3s \
    server \
        '--cluster-init' \
        '--token' \
        '[REDACTED]' \
        '--disable' \
        'traefik' \
        '--kube-apiserver-arg=feature-gates=MixedProtocolLBService=true' \
        '--disable' \
        'local-storage' \
        '--disable' \
        'servicelb' \
        '--container-runtime-endpoint' \
        'unix:///run/containerd/containerd.sock' \
        '--tls-san=192.168.10.239' \

Then restart the k3s service to start it using the new containerd. I don't think its related to the snapshotter being used, so I highly doubt you need to make those changes.

I'm on a newer K3s version than you, but I've had this issue through many version.

$ sudo kubectl version --short

Client Version: v1.24.3+k3s1
Kustomize Version: v4.5.4
Server Version: v1.24.3+k3s1

$ argocd app get kube-prometheus-stack-crds | grep Target
Target:             v0.58.0

$ argocd app get kube-prometheus-stack | grep Target
Target:             39.7.0

$ sudo kubectl get pod -n monitoring
NAME                                     READY   STATUS    RESTARTS        AGE
alertmanager-prometheus-alertmanager-0   2/2     Running   0               16h
grafana-66bd55698c-h7vsv                 3/3     Running   0               2d1h
kube-state-metrics-77c4c7558c-q7vsx      1/1     Running   0               16h
node-exporter-b9bpg                      1/1     Running   10 (14h ago)    14d
node-exporter-fkdmm                      1/1     Running   3 (2d17h ago)   14d
node-exporter-nxp9k                      1/1     Running   6 (14h ago)     14d
prometheus-operator-68fdccc5c6-d9hfh     1/1     Running   0               56m
prometheus-prometheus-prometheus-0       2/2     Running   0               14h

$ cat /etc/containerd/config.toml

disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

  [plugins."io.containerd.gc.v1.scheduler"]
    deletion_threshold = 0
    mutation_threshold = 100
    pause_threshold = 0.02
    schedule_delay = "0s"
    startup_delay = "100ms"

  [plugins."io.containerd.grpc.v1.cri"]
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    enable_selinux = false
    enable_tls_streaming = false
    ignore_image_defined_volumes = false
    max_concurrent_downloads = 3
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = "k8s.gcr.io/pause:3.5"
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/usr/lib/cni"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      max_conf_num = 1

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      no_pivot = false
      snapshotter = "zfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false

      [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
        base_runtime_spec = ""
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]

    [plugins."io.containerd.grpc.v1.cri".image_decryption]
      key_model = "node"

    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = ""

      [plugins."io.containerd.grpc.v1.cri".registry.auths]

      [plugins."io.containerd.grpc.v1.cri".registry.configs]

      [plugins."io.containerd.grpc.v1.cri".registry.headers]

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

    [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
      tls_cert_file = ""
      tls_key_file = ""

  [plugins."io.containerd.internal.v1.opt"]
    path = "/opt/containerd"

  [plugins."io.containerd.internal.v1.restart"]
    interval = "10s"

  [plugins."io.containerd.metadata.v1.bolt"]
    content_sharing_policy = "shared"

  [plugins."io.containerd.monitor.v1.cgroups"]
    no_prometheus = false

  [plugins."io.containerd.runtime.v1.linux"]
    no_shim = false
    runtime = "runc"
    runtime_root = ""
    shim = "containerd-shim"
    shim_debug = false

  [plugins."io.containerd.runtime.v2.task"]
    platforms = ["linux/amd64"]

  [plugins."io.containerd.service.v1.diff-service"]
    default = ["walking"]

  [plugins."io.containerd.snapshotter.v1.aufs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.btrfs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.devmapper"]
    async_remove = false
    base_image_size = ""
    pool_name = ""
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.native"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.zfs"]
    root_path = ""

[proxy_plugins]

[stream_processors]

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar"

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar+gzip"

[timeouts]
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

[ttrpc]
  address = ""
  gid = 0
  uid = 0

Aug 17 '22 17:08 reefland

Can you post what your default containerd config file looks like? /var/lib/rancher/k3s/agent/etc/containerd/config.toml

I don't have one to compare against.

Aug 17 '22 17:08 reefland

$ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml

[plugins.opt]
  path = "/var/lib/rancher/k3s/agent/containerd"

[plugins.cri]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins.cri.containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true


[plugins.cri.cni]
  bin_dir = "/var/lib/rancher/k3s/data/05cfd5aec8ddf622207749ef3eda0e0efa12d8900105fdac78815a8cd6c73685/bin"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"


[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

Aug 17 '22 19:08 dotdc

Could you try setting this in your kube-prometheus-stack values and see if it changes anything?

kubelet:
  enabled: true
  serviceMonitor:
    ## Enable scraping /metrics/resource from kubelet's service
    ## This is disabled by default because container metrics are already exposed by cAdvisor
    resource: true

Didn't had time to test further but according to https://github.com/containerd/containerd/issues/4541#issuecomment-974709561, they tried to fix an issue between cAdvisor and containerd in k8s 1.23. It's a bit different, but maybe there's something to dig arround here.

Also, cAdvisor has a dedicated containerd tag for their image in their repository:
https://console.cloud.google.com/gcr/images/cadvisor/global/cadvisor (v0.45.0-containerd-cri)

Maybe it's worth trying to disable cAdvisor from kube-prometheus-stack and deploy the v0.45.0-containerd-cri version as a demonset and create a dedicated serviceMonitor to scrape the metrics.

Let me know if you try any of theses.

Aug 18 '22 08:08 dotdc

I tried...

       resource: true
       # From kubernetes 1.18, /metrics/resource/v1alpha1 renamed to /metrics/resource
       resourcePath: "/metrics/resource"

container_cpu_usage_seconds_total{image!=""} still returns an empty set.

Aug 18 '22 19:08 reefland

Effort to reduce variables, greatly reduced the size of the containerd config file, everything still works great, but I don't see any difference in the metrics.

main differences besides the snapshotter, was stream_server_port from 0 to 10010 and changed sandbox_image to use the same one as k3s which bumps version from 3.5 to 3.6 as well.

$ cat /etc/containerd/config.toml

root = "/var/lib/containerd"
state = "/run/containerd"
version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    stream_server_address = "127.0.0.1"
    stream_server_port = "10010"
    enable_selinux = false
    sandbox_image = "rancher/mirrored-pause:3.6"

  [plugins."io.containerd.grpc.v1.cri".cni]
    bin_dir = "/usr/lib/cni"
    conf_dir = "/etc/cni/net.d"

  [plugins."io.containerd.grpc.v1.cri".containerd]
    default_runtime_name = "runc"
    snapshotter = "zfs"

Was there anything interesting in this options directory with your install?

[plugins.opt]
  path = "/var/lib/rancher/k3s/agent/containerd"

Aug 19 '22 21:08 reefland

I played around with your suggestion cAdvisor Kubernetes Daemonset on a test node:

$ k get all -n cadvisor
NAME                 READY   STATUS    RESTARTS   AGE
pod/cadvisor-pm2lw   1/1     Running   0          13m

NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/cadvisor   1         1         1       1            1           <none>          13m

When I curl for a metric, seem rather large, does this seem normal in your experience?

$ curl http://10.42.0.139:8080/metrics | grep container_cpu_usage_seconds_total

container_cpu_usage_seconds_total{container_label_alertmanager="",container_label_app="",container_label_app_kubernetes_io_component="",container_label_app_kubernetes_io
_csi_role="",container_label_app_kubernetes_io_instance="",container_label_app_kubernetes_io_managed_by="",container_label_app_kubernetes_io_name="",container_label_app_
kubernetes_io_part_of="",container_label_app_kubernetes_io_version="",container_label_chart="",container_label_com_suse_eula="",container_label_com_suse_image_type="",co
ntainer_label_com_suse_lifecycle_url="",container_label_com_suse_release_stage="",container_label_com_suse_sle_base_created="",container_label_com_suse_sle_base_descript
ion="",container_label_com_suse_sle_base_disturl="",container_label_com_suse_sle_base_eula="",container_label_com_suse_sle_base_image_type="",container_label_com_suse_sl
e_base_lifecycle_url="",container_label_com_suse_sle_base_reference="",container_label_com_suse_sle_base_release_stage="",container_label_com_suse_sle_base_source="",con
tainer_label_com_suse_sle_base_supportlevel="",container_label_com_suse_sle_base_title="",container_label_com_suse_sle_base_url="",container_label_com_suse_sle_base_vend
or="",container_label_com_suse_sle_base_version="",container_label_com_suse_supportlevel="",container_label_component="",container_label_controller_revision_hash="",cont
ainer_label_description="",container_label_helm_sh_chart="",container_label_heritage="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_
container_name="grafana",container_label_io_kubernetes_pod_name="grafana-5c9dc97c-mqxr6",container_label_io_kubernetes_pod_namespace="monitoring",container_label_io_kube
rnetes_pod_uid="b32a26d7-b940-4dce-8172-9091c0b7f255",container_label_jobLabel="",container_label_k8s_app="",container_label_longhorn_io_component="",container_label_lon
ghorn_io_engine_image="",container_label_longhorn_io_instance_manager_image="",container_label_longhorn_io_instance_manager_type="",container_label_longhorn_io_managed_b
y="",container_label_longhorn_io_node="",container_label_maintainer="",container_label_maintainers="",container_label_name="",container_label_operator_prometheus_io_name
="",container_label_operator_prometheus_io_shard="",container_label_org_openbuildservice_disturl="",container_label_org_opencontainers_image_created="",container_label_o
rg_opencontainers_image_description="",container_label_org_opencontainers_image_documentation="",container_label_org_opencontainers_image_licenses="",container_label_org
_opencontainers_image_revision="",container_label_org_opencontainers_image_source="",container_label_org_opencontainers_image_title="",container_label_org_opencontainers
_image_url="",container_label_org_opencontainers_image_vendor="",container_label_org_opencontainers_image_version="",container_label_org_opensuse_reference="",container_
label_pod_template_generation="",container_label_pod_template_hash="",container_label_prometheus="",container_label_release="",container_label_revision="",container_labe
l_statefulset_kubernetes_io_pod_name="",container_label_upgrade_cattle_io_controller="",cpu="total",id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f
72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6",image="docker.io/grafana/grafana:9.0.5",name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b
7d6"} 267.270606 1661360792400

At least it does include an image key now:

image="docker.io/grafana/grafana:9.0.5"

Sure does spit out a lot...

$ curl http://10.42.0.139:8080/metrics | wc -l
13734

I haven't tried to disable cAdvisor from kube-prometheus-stack nor added this as a scrape yet.

Aug 24 '22 17:08 reefland

Not sure if it's normal but:

there's a lot of empty labels in your metric...
you have the image label but not the container one, instead, you have container_name

Still not good :disappointed:

Aug 24 '22 18:08 dotdc

A little more tweaking, it looks like this:

$ curl -s http://10.42.0.143:8080/metrics | grep container_cpu_usage_seconds_total | grep grafana

container_cpu_usage_seconds_total{container_label_io_kubernetes_container_name="grafana",container_label_io_kubernetes_pod_namespace="monitoring",cpu="total",id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6",image="docker.io/grafana/grafana:9.0.5",name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6"} 299.354727 1661369635910

I added a PodMonitor to scape metrics (have not disabled KPS cadvisor yet), within Prometheus it looks like:

container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="grafana", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", endpoint="http", id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6", image="docker.io/grafana/grafana:9.0.5", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6", namespace="cadvisor", pod="cadvisor-tqbj6"}
| 305.831161

Slightly better maybe?

Aug 24 '22 19:08 reefland

It seems to be marking everything as its container and namespace:

It seems like some items could be dropped and other stuff relabeled to get what is needed?

This is the ONLY namespace that lights up your dashboard, and it does seem like everything is lumped together :)

Aug 24 '22 20:08 reefland

At least, you can kinda test the dashboard now!

Just realized I missed the label name because it was a new line! The container label equivalent is container_label_io_kubernetes_container_name in your case (not container_name).

Even if you make this work, you will still have a pretty uncommon setup and maybe uncommon problems in the future... Maybe it's not a big deal for you, but I feel this is far from an ideal solution...

Did you tried to post in the #k3s channel of the Rancher Slack to :

Get k3s to support your ZFS setup (feature request or support)
Explain the problem to see if anyone else had a similar problem with cAdvisor?

Aug 24 '22 20:08 dotdc

Left a question on #k3s slack see if I get a nibble.

Aug 25 '22 20:08 reefland

Found this: https://github.com/k3s-io/k3s/issues/66, looks like ZFS support for k3s is a dead end for now...

Will this be something for you https://github.com/k3s-io/k3s/issues/66#issuecomment-520183720 ?

Aug 25 '22 21:08 dotdc

I'm retiring my old ZFS/Docker server to be ZFS/Kubernetes.

There are cleaner ways to get ZFS working with Docker. Issues tended to be a race condition with Docker not waiting for ZFS filesystems to be in-place, Docker would then create a file/directory which prevented ZFS from mounting the dataset as the mountpoint was no longer empty. There are systemd tricks you can use to force docker to wait until ZFS has completed. Been running smooth for years since.

As to using a zvol for ext4 for that one directory to use the bundled containerd... Potential solution, but using standard containerd works great minus this odd Prometheus label issue. I think its just a configuration tweak to fix if we can figure out which projects needs to make it.

My understanding is Azure using containerd by default as well and LENS project has similar issues with metrics.

Aug 25 '22 21:08 reefland

Hi @reefland,

Did you manage to make everything working together? Would be cool to share it if someone with a similar setup ends up reading this thread :blush:

As it's not related directly to this project, maybe we can close this issue what do you think?

Sep 12 '22 20:09 dotdc

Was not able to make it work. I do think the right course of action is to use a EXT4 formatted ZFS ZVOL for containerd and go back to the standard snapshotter to reduce complexity of the installation. I'm not sure if I can do that level of swapping the engine out while the plane is in flight.... and I doubt I will rebuild the cluster from scratch just for these metrics.

I'll close the issue, don't expect this to be resolvable for a while. Thank for your time on this. Much appreciated.

Sep 16 '22 01:09 reefland

@dotdc - Just wanted to follow-up that I was able to get the cadvisor deployed as a daemonset working. What I was missing with the last attempt was relabelings on the job doing the daemonset scrape. Plus it seemed an extras space in the whitelisting of labels in the kustomize example they provide caused an issue too.

Within my Kube-Prometheus-Stack Helm chart / values.yaml section, I needed to add:

           additionalScrapeConfigs:
              # CADVISOR SCRAPE JOB for externally installed cadvisor because of k8s with containerd problems
              - job_name: "kubernetes-cadvisor"
                kubernetes_sd_configs:
                  - role: pod  # we get needed info from the pods
                    namespaces:
                      names: 
                        - cadvisor
                    selectors:
                      - role: pod
                        label: "app=cadvisor"  # and only select the cadvisor pods with this label set as source
                metric_relabel_configs:  # we relabel some labels inside the scraped metrics
                  # this should look at the scraped metric and replace/add the label inside
                  - source_labels: [container_label_io_kubernetes_pod_namespace]
                    target_label: "namespace"
                  - source_labels: [container_label_io_kubernetes_pod_name]
                    target_label: "pod"
                  - source_labels: [container_label_io_kubernetes_container_name]
                    target_label: "container"

Now I have image=, container=, and pod= values:

And your dashboard lights up as expected:

This is only on my test cluster so far, but still a great sign. Still need to do some more testing and apply a bunch of drop labels to match how kubelet cadavisor is handled.

Dec 22 '22 17:12 reefland

Hi @reefland,

Really happy to learn that you finally managed to fix this and able to use the new version of the dashboard! :partying_face: Thanks for sharing your settings, it may help someone using a similar setup!

Wish you a happy holiday season!

Dec 22 '22 17:12 dotdc

Thank you!

Another update... I yanked the external cAdvisor daemonset. Prometheus started to constantly complain about duplicate, out of order and dropped labels. Hundreds per second. It just felt like I was digging a deeper and deeper hole.

So instead node by node, I yanked the external containerd / runc / ZFS snapshotter. I moved the /var/lib/rancher to the side, added a 30GB ZVOL formatted with XFS and mounted it to /var/lib/rancher. Removed the tweaks added to the k3s and k3s-agent services for an external containerd. And copied back in the directory structure.

Was able to swap out all this without a reinstall and no down time. I'm luv'n Kubernetes :)

So I still have all the pluses of ZFS on bare metal (mirrors, compression, encryption, snapshots, boot environments, rollbacks) and the one important directory for K3s is now on a ZFS backed (mirrored / compression / encrypted) XFS filesystem (which allows easy volume expansion in the future if I need it). And K3S can use its default overlayfs and all the metrics just magically work.

Even dashboards from Mixin which have never worked for me, started to light up within seconds of the first node being converted:

I'm monitoring my system logs, everything looks great, Prometheus is happy. All my dashboards are working as expected with a standardish K3S "default" install.

Jan 08 '23 01:01 reefland

grafana-dashboards-kubernetes
grafana-dashboards-kubernetes copied to clipboard

Issues with node_cpu_seconds_total

grafana-dashboards-kubernetes grafana-dashboards-kubernetes copied to clipboard

Issues with node_cpu_seconds_total

grafana-dashboards-kubernetes
grafana-dashboards-kubernetes copied to clipboard