grafana-dashboards-kubernetes
grafana-dashboards-kubernetes copied to clipboard
Issues with node_cpu_seconds_total
I tested the latest changes, and still not right...
Panel CPU Utilization by Node "expr": "avg by (node) (1-rate(node_cpu_seconds_total{mode=\"idle\"}[$__rate_interval]))",
yields:
Seems to be the total of all nodes? It is not picking up the multiple nodes, It should look like:
Panel CPU Utilization by namespace is still dark and using old metric: "expr": "sum(rate(container_cpu_usage_seconds_total{image!=\"\"}[$__rate_interval])) by (namespace)",
I did try something like above "avg by (namespace) (1-rate(node_cpu_seconds_total{mode=\"idle\"}[$__rate_interval]))"
that is not right, only got one namespace listed:
Both Memory Utilization Panels are still dark based on container_memory_working_set_bytes
when I use your unmodified files.
Kicking around some other ideas and learned how to merge two metrics which share a common key.
"expr": "avg by (nodename) (instance:node_cpu:ratio * on(instance) group_left(nodename) node_uname_info)"
Hi @reefland,
Yes I noticed that too... will try to fix this today and let you know when everything works with 37.*
.
Can you try with the latest version? Commit : https://github.com/dotdc/grafana-dashboards-kubernetes/commit/132a29652829acb9927db10fff979398d4ac56ee
I think the "REAL" RAM usage is now including all kinds of cached memory as "used". Previous method is reporting 12.9GB RAM used, new method is reporting 39.8GB RAM used (I have both running side by side). It's not a measurement of what Kubernetes is using anymore.
According to RedHat https://access.redhat.com/solutions/406773 (Covers most of the node_memory_*
counters)
Mem: used = MemTotal - MemFree - Buffers - Cached - Slab
So I tried:
sum(node_memory_MemTotal_bytes - (node_memory_Buffers_bytes + node_memory_Cached_bytes + node_memory_MemFree_bytes + node_memory_Slab_bytes))
This dropped it a bit to 34.3 GB.
For me ~23.6 GB usage is ZFS Arc cache which ZFS will return to OS if any memory pressure happens. Temporary used, but still used I guess.
- The
CPU Utilization by node
should probably be renamed toby instance
now. - The
CPU Utilization by namespace
is still dark, not updated yet.
I also noticed a difference with the previous method that didn't included everything. I think it make sense to have "Real" match the system resources usage. Otherwise you could see available resources on the dashboards but the scheduler could fail to start pods due to the lack of resources behind the scene.
With the new method, the REAL value should match the system /proc. I compared metrics with free/top on my side and everything looked good. I'll try to test on a cluster with more load to see if I need to exclude cache.
For your ZFS Arc cache, maybe is not seen as cache by the system and I'm not sure how to get around this. Do you have a way to test Memory Usage
panel from Views / Nodes
?
CPU Utilization by namespace
should work, is it still a label issue with k3s?
I think you have the memory right. Checked other tools like LENS and it lines right up.
The CPU & Memory by namespace, Memory Utilization by node are dark as I do not have metrics with a image=
attribute. I wonder if I can set it with a relabel but not sure what the image value is supposed to be, is it a docker image reference? I manually change image=
to pod=
and the panel seems to work fine.
I also upgraded to Kube Stack Prometheus 38.0.1 which has this:
# Drop cgroup metrics with no pod.
- sourceLabels: [id, pod]
action: drop
regex: '.+;'
Which caused container_network_receive_bytes_total
and container_network_transmit_bytes_total
to get dropped. Noticed after a few hours I lost all my network stats since upgrading. I removed that drop and my network stats came back.
Hi @reefland,
Yes the image label contains the name of the image with the tag or sha256.
Example: k8s.gcr.io/kube-state-metrics/kube-state-metrics@sha256:0ccff0db0a342d264c8f4fe5a0841d727fbb8e6cc0c7e741441771f165262182
I checked before and this label is available by default on cAdvisor, but maybe it gets dropped at some point to reduce entropy on your setup.
Did you try a recursive grep
or else to find a rule that would drop this image
label?
Sorry I never took the time to try using k3s, I'll try but missing time lately 😅
Do you still have an issue with container_network_receive_bytes_total
and container_network_transmit_bytes_total
?
I think they did a rollback on this one, just checked with 39.5.0
and looked good on my side.
Let me know
I still can't use your dashboards unmodified:
My networking metrics are looking good again. But I think my overrides are still in place...
I've not been able to find a rule that would drop the image label. Honestly, I'm not even sure where to look. I assume I can use Prometheus configuration screen, prometheus-kubelet
job?
Nothing stands out:
- job_name: serviceMonitor/monitoring/prometheus-kubelet/1
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name, __meta_kubernetes_service_labelpresent_app_kubernetes_io_name]
separator: ;
regex: (kubelet);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app, __meta_kubernetes_service_labelpresent_k8s_app]
separator: ;
regex: (kubelet);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
metric_relabel_configs:
- source_labels: [__name__]
separator: ;
regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
replacement: $1
action: drop
- source_labels: [__name__]
separator: ;
regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
replacement: $1
action: drop
- source_labels: [__name__]
separator: ;
regex: container_memory_(mapped_file|swap)
replacement: $1
action: drop
- source_labels: [__name__]
separator: ;
regex: container_(file_descriptors|tasks_state|threads_max)
replacement: $1
action: drop
- source_labels: [__name__]
separator: ;
regex: container_spec.*
replacement: $1
action: drop
- source_labels: [node]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system
Ok we'll use this issue to understand why your k3s setup is missing both the image
and the container
labels from cAdvisor.
Just installed k3s on my laptop and got everything working out of the box using the default configuration.
OS: archlinux
k3s version:
david@laptop ~ $ sudo k3s kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k3s1", GitCommit:"313aaca547f030752788dce696fdf8c9568bc035", GitTreeState:"clean", BuildDate:"2022-03-31T01:02:40Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k3s1", GitCommit:"313aaca547f030752788dce696fdf8c9568bc035", GitTreeState:"clean", BuildDate:"2022-03-31T01:02:40Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
kube-prometheus-stack version:
david@laptop ~ $ helm ls -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
david@laptop ~ $ helm ls -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kube-prometheus-stack monitoring 1 2022-08-17 15:13:29.783638314 +0200 CEST deployed kube-prometheus-stack-39.8.0 0.58.0
Everything is running:
david@laptop ~ $ sudo k3s kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
kube-prometheus-stack-prometheus-node-exporter-qgzwl 1/1 Running 0 19m
kube-prometheus-stack-operator-5995b5478d-vlrss 1/1 Running 0 19m
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 19m
kube-prometheus-stack-kube-state-metrics-5f6d6c64d5-tpnmt 1/1 Running 0 19m
kube-prometheus-stack-grafana-c9569b849-28zxf 3/3 Running 0 19m
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 19m
All dashboards are working:
And I also have the image
and the container
labels:
Can you try to deploy a copy of your setup with no customization at all (no values, no parameters etc...) Just raw k3s with the latest version of kube-prometheus-stack and the latest dashboards.
Also, can you please share again as much information as possible on your setup: Hardware (and/or Hypervisor with version), processor architecture, guest OS, k8s version, k3s version, kube-prometheus-stack version etc...
I'll do my best to get as close as possible to your setup to figure this out.
Let me know,
David
I think the key difference is that I'm unable to run the containerd
that is built into the k3s
binary as I need to use the ZFS snapshotter. The containerd they use is very lightweight version and the overlay FS they use will not work on ZFS, the K3s service can't even start. So I can't use the default install.
If you can try to install the standard containerd
yourself:
$ sudo apt install containerd containernetworking-plugins iptables
$ containerd --version
containerd github.com/containerd/containerd 1.5.9-0ubuntu3
$ sudo crictl version
Version: 0.1.0
RuntimeName: containerd
RuntimeVersion: 1.5.9-0ubuntu3
RuntimeApiVersion: v1alpha2
Generate a default config file:
$ sudo -i
# containerd config default > /etc/containerd/config.toml
You should then be able to use the new containerd installed:
$ sudo ctr --address /run/containerd/containerd.sock plugins ls
TYPE ID PLATFORMS STATUS
io.containerd.content.v1 content - ok
io.containerd.snapshotter.v1 aufs linux/amd64 skip
io.containerd.snapshotter.v1 btrfs linux/amd64 skip
io.containerd.snapshotter.v1 devmapper linux/amd64 error
io.containerd.snapshotter.v1 native linux/amd64 ok
io.containerd.snapshotter.v1 overlayfs linux/amd64 ok
io.containerd.snapshotter.v1 zfs linux/amd64 ok
io.containerd.metadata.v1 bolt - ok
io.containerd.differ.v1 walking linux/amd64 ok
io.containerd.gc.v1 scheduler - ok
io.containerd.service.v1 introspection-service - ok
io.containerd.service.v1 containers-service - ok
io.containerd.service.v1 content-service - ok
io.containerd.service.v1 diff-service - ok
io.containerd.service.v1 images-service - ok
io.containerd.service.v1 leases-service - ok
io.containerd.service.v1 namespaces-service - ok
io.containerd.service.v1 snapshots-service - ok
io.containerd.runtime.v1 linux linux/amd64 ok
io.containerd.runtime.v2 task linux/amd64 ok
io.containerd.monitor.v1 cgroups linux/amd64 ok
io.containerd.service.v1 tasks-service - ok
io.containerd.internal.v1 restart - ok
io.containerd.grpc.v1 containers - ok
io.containerd.grpc.v1 content - ok
io.containerd.grpc.v1 diff - ok
io.containerd.grpc.v1 events - ok
io.containerd.grpc.v1 healthcheck - ok
io.containerd.grpc.v1 images - ok
io.containerd.grpc.v1 leases - ok
io.containerd.grpc.v1 namespaces - ok
io.containerd.internal.v1 opt - ok
io.containerd.grpc.v1 snapshots - ok
io.containerd.grpc.v1 tasks - ok
io.containerd.grpc.v1 version - ok
io.containerd.grpc.v1 cri linux/amd64 ok
And then point to that containerd by updating the k3s service, just add this to list of parameters:
--container-runtime-endpoint unix:///run/containerd/containerd.sock
Most of this you don't need, but for completeness:
ExecStart=/usr/local/bin/k3s \
server \
'--cluster-init' \
'--token' \
'[REDACTED]' \
'--disable' \
'traefik' \
'--kube-apiserver-arg=feature-gates=MixedProtocolLBService=true' \
'--disable' \
'local-storage' \
'--disable' \
'servicelb' \
'--container-runtime-endpoint' \
'unix:///run/containerd/containerd.sock' \
'--tls-san=192.168.10.239' \
Then restart the k3s
service to start it using the new containerd. I don't think its related to the snapshotter being used, so I highly doubt you need to make those changes.
I'm on a newer K3s version than you, but I've had this issue through many version.
$ sudo kubectl version --short
Client Version: v1.24.3+k3s1
Kustomize Version: v4.5.4
Server Version: v1.24.3+k3s1
$ argocd app get kube-prometheus-stack-crds | grep Target
Target: v0.58.0
$ argocd app get kube-prometheus-stack | grep Target
Target: 39.7.0
$ sudo kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-alertmanager-0 2/2 Running 0 16h
grafana-66bd55698c-h7vsv 3/3 Running 0 2d1h
kube-state-metrics-77c4c7558c-q7vsx 1/1 Running 0 16h
node-exporter-b9bpg 1/1 Running 10 (14h ago) 14d
node-exporter-fkdmm 1/1 Running 3 (2d17h ago) 14d
node-exporter-nxp9k 1/1 Running 6 (14h ago) 14d
prometheus-operator-68fdccc5c6-d9hfh 1/1 Running 0 56m
prometheus-prometheus-prometheus-0 2/2 Running 0 14h
$ cat /etc/containerd/config.toml
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "/run/containerd/containerd.sock"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
[metrics]
address = ""
grpc_histogram = false
[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
deletion_threshold = 0
mutation_threshold = 100
pause_threshold = 0.02
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
disable_apparmor = false
disable_cgroup = false
disable_hugetlb_controller = true
disable_proc_mount = false
disable_tcp_service = true
enable_selinux = false
enable_tls_streaming = false
ignore_image_defined_volumes = false
max_concurrent_downloads = 3
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "k8s.gcr.io/pause:3.5"
selinux_category_range = 1024
stats_collect_period = 10
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
systemd_cgroup = false
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/usr/lib/cni"
conf_dir = "/etc/cni/net.d"
conf_template = ""
max_conf_num = 1
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
disable_snapshot_annotations = true
discard_unpacked_layers = false
no_pivot = false
snapshotter = "zfs"
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
base_runtime_spec = ""
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
base_runtime_spec = ""
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = false
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = "node"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.monitor.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.runtime.v1.linux"]
no_shim = false
runtime = "runc"
runtime_root = ""
shim = "containerd-shim"
shim_debug = false
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
[plugins."io.containerd.snapshotter.v1.aufs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.btrfs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.devmapper"]
async_remove = false
base_image_size = ""
pool_name = ""
root_path = ""
[plugins."io.containerd.snapshotter.v1.native"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.overlayfs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.zfs"]
root_path = ""
[proxy_plugins]
[stream_processors]
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar"
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar+gzip"
[timeouts]
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
[ttrpc]
address = ""
gid = 0
uid = 0
Can you post what your default containerd config file looks like? /var/lib/rancher/k3s/agent/etc/containerd/config.toml
I don't have one to compare against.
$ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
sandbox_image = "rancher/mirrored-pause:3.6"
[plugins.cri.containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins.cri.cni]
bin_dir = "/var/lib/rancher/k3s/data/05cfd5aec8ddf622207749ef3eda0e0efa12d8900105fdac78815a8cd6c73685/bin"
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
Could you try setting this in your kube-prometheus-stack values and see if it changes anything?
kubelet:
enabled: true
serviceMonitor:
## Enable scraping /metrics/resource from kubelet's service
## This is disabled by default because container metrics are already exposed by cAdvisor
resource: true
Didn't had time to test further but according to https://github.com/containerd/containerd/issues/4541#issuecomment-974709561, they tried to fix an issue between cAdvisor and containerd in k8s 1.23. It's a bit different, but maybe there's something to dig arround here.
Also, cAdvisor has a dedicated containerd tag for their image in their repository:
https://console.cloud.google.com/gcr/images/cadvisor/global/cadvisor (v0.45.0-containerd-cri)
Maybe it's worth trying to disable cAdvisor from kube-prometheus-stack and deploy the v0.45.0-containerd-cri version as a demonset and create a dedicated serviceMonitor to scrape the metrics.
Let me know if you try any of theses.
I tried...
resource: true
# From kubernetes 1.18, /metrics/resource/v1alpha1 renamed to /metrics/resource
resourcePath: "/metrics/resource"
container_cpu_usage_seconds_total{image!=""}
still returns an empty set.
Effort to reduce variables, greatly reduced the size of the containerd config file, everything still works great, but I don't see any difference in the metrics.
main differences besides the snapshotter, was stream_server_port
from 0
to 10010
and changed sandbox_image
to use the same one as k3s which bumps version from 3.5 to 3.6 as well.
$ cat /etc/containerd/config.toml
root = "/var/lib/containerd"
state = "/run/containerd"
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = false
sandbox_image = "rancher/mirrored-pause:3.6"
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/usr/lib/cni"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
snapshotter = "zfs"
Was there anything interesting in this options directory with your install?
[plugins.opt]
path = "/var/lib/rancher/k3s/agent/containerd"
I played around with your suggestion cAdvisor Kubernetes Daemonset on a test node:
$ k get all -n cadvisor
NAME READY STATUS RESTARTS AGE
pod/cadvisor-pm2lw 1/1 Running 0 13m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cadvisor 1 1 1 1 1 <none> 13m
When I curl for a metric, seem rather large, does this seem normal in your experience?
$ curl http://10.42.0.139:8080/metrics | grep container_cpu_usage_seconds_total
container_cpu_usage_seconds_total{container_label_alertmanager="",container_label_app="",container_label_app_kubernetes_io_component="",container_label_app_kubernetes_io
_csi_role="",container_label_app_kubernetes_io_instance="",container_label_app_kubernetes_io_managed_by="",container_label_app_kubernetes_io_name="",container_label_app_
kubernetes_io_part_of="",container_label_app_kubernetes_io_version="",container_label_chart="",container_label_com_suse_eula="",container_label_com_suse_image_type="",co
ntainer_label_com_suse_lifecycle_url="",container_label_com_suse_release_stage="",container_label_com_suse_sle_base_created="",container_label_com_suse_sle_base_descript
ion="",container_label_com_suse_sle_base_disturl="",container_label_com_suse_sle_base_eula="",container_label_com_suse_sle_base_image_type="",container_label_com_suse_sl
e_base_lifecycle_url="",container_label_com_suse_sle_base_reference="",container_label_com_suse_sle_base_release_stage="",container_label_com_suse_sle_base_source="",con
tainer_label_com_suse_sle_base_supportlevel="",container_label_com_suse_sle_base_title="",container_label_com_suse_sle_base_url="",container_label_com_suse_sle_base_vend
or="",container_label_com_suse_sle_base_version="",container_label_com_suse_supportlevel="",container_label_component="",container_label_controller_revision_hash="",cont
ainer_label_description="",container_label_helm_sh_chart="",container_label_heritage="",container_label_io_cri_containerd_kind="container",container_label_io_kubernetes_
container_name="grafana",container_label_io_kubernetes_pod_name="grafana-5c9dc97c-mqxr6",container_label_io_kubernetes_pod_namespace="monitoring",container_label_io_kube
rnetes_pod_uid="b32a26d7-b940-4dce-8172-9091c0b7f255",container_label_jobLabel="",container_label_k8s_app="",container_label_longhorn_io_component="",container_label_lon
ghorn_io_engine_image="",container_label_longhorn_io_instance_manager_image="",container_label_longhorn_io_instance_manager_type="",container_label_longhorn_io_managed_b
y="",container_label_longhorn_io_node="",container_label_maintainer="",container_label_maintainers="",container_label_name="",container_label_operator_prometheus_io_name
="",container_label_operator_prometheus_io_shard="",container_label_org_openbuildservice_disturl="",container_label_org_opencontainers_image_created="",container_label_o
rg_opencontainers_image_description="",container_label_org_opencontainers_image_documentation="",container_label_org_opencontainers_image_licenses="",container_label_org
_opencontainers_image_revision="",container_label_org_opencontainers_image_source="",container_label_org_opencontainers_image_title="",container_label_org_opencontainers
_image_url="",container_label_org_opencontainers_image_vendor="",container_label_org_opencontainers_image_version="",container_label_org_opensuse_reference="",container_
label_pod_template_generation="",container_label_pod_template_hash="",container_label_prometheus="",container_label_release="",container_label_revision="",container_labe
l_statefulset_kubernetes_io_pod_name="",container_label_upgrade_cattle_io_controller="",cpu="total",id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f
72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6",image="docker.io/grafana/grafana:9.0.5",name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b
7d6"} 267.270606 1661360792400
At least it does include an image
key now:
image="docker.io/grafana/grafana:9.0.5"
Sure does spit out a lot...
$ curl http://10.42.0.139:8080/metrics | wc -l
13734
I haven't tried to disable cAdvisor from kube-prometheus-stack nor added this as a scrape yet.
Not sure if it's normal but:
- there's a lot of empty labels in your metric...
- you have the
image
label but not thecontainer
one, instead, you havecontainer_name
Still not good :disappointed:
A little more tweaking, it looks like this:
$ curl -s http://10.42.0.143:8080/metrics | grep container_cpu_usage_seconds_total | grep grafana
container_cpu_usage_seconds_total{container_label_io_kubernetes_container_name="grafana",container_label_io_kubernetes_pod_namespace="monitoring",cpu="total",id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6",image="docker.io/grafana/grafana:9.0.5",name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6"} 299.354727 1661369635910
I added a PodMonitor to scape metrics (have not disabled KPS cadvisor yet), within Prometheus it looks like:
container_cpu_usage_seconds_total{container="cadvisor", container_label_io_kubernetes_container_name="grafana", container_label_io_kubernetes_pod_namespace="monitoring", cpu="total", endpoint="http", id="/kubepods/besteffort/podb32a26d7-b940-4dce-8172-9091c0b7f255/274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6", image="docker.io/grafana/grafana:9.0.5", instance="10.42.0.143:8080", job="monitoring/cadvisor-prometheus-podmonitor", name="274f72dd18fb5c829d8a63bb0941eda804c5d44a7cbd899a21b389ff3f87b7d6", namespace="cadvisor", pod="cadvisor-tqbj6"}
| 305.831161
Slightly better maybe?
It seems to be marking everything as its container and namespace:
It seems like some items could be dropped and other stuff relabeled to get what is needed?
This is the ONLY namespace that lights up your dashboard, and it does seem like everything is lumped together :)
At least, you can kinda test the dashboard now!
Just realized I missed the label name because it was a new line!
The container
label equivalent is container_label_io_kubernetes_container_name
in your case (not container_name
).
Even if you make this work, you will still have a pretty uncommon setup and maybe uncommon problems in the future... Maybe it's not a big deal for you, but I feel this is far from an ideal solution...
Did you tried to post in the #k3s
channel of the Rancher Slack to :
- Get k3s to support your ZFS setup (feature request or support)
- Explain the problem to see if anyone else had a similar problem with cAdvisor?
Left a question on #k3s slack see if I get a nibble.
Found this: https://github.com/k3s-io/k3s/issues/66, looks like ZFS support for k3s is a dead end for now...
Will this be something for you https://github.com/k3s-io/k3s/issues/66#issuecomment-520183720 ?
I'm retiring my old ZFS/Docker server to be ZFS/Kubernetes.
There are cleaner ways to get ZFS working with Docker. Issues tended to be a race condition with Docker not waiting for ZFS filesystems to be in-place, Docker would then create a file/directory which prevented ZFS from mounting the dataset as the mountpoint was no longer empty. There are systemd tricks you can use to force docker to wait until ZFS has completed. Been running smooth for years since.
As to using a zvol for ext4 for that one directory to use the bundled containerd... Potential solution, but using standard containerd works great minus this odd Prometheus label issue. I think its just a configuration tweak to fix if we can figure out which projects needs to make it.
My understanding is Azure using containerd by default as well and LENS project has similar issues with metrics.
Hi @reefland,
Did you manage to make everything working together? Would be cool to share it if someone with a similar setup ends up reading this thread :blush:
As it's not related directly to this project, maybe we can close this issue what do you think?
Was not able to make it work. I do think the right course of action is to use a EXT4 formatted ZFS ZVOL for containerd and go back to the standard snapshotter to reduce complexity of the installation. I'm not sure if I can do that level of swapping the engine out while the plane is in flight.... and I doubt I will rebuild the cluster from scratch just for these metrics.
I'll close the issue, don't expect this to be resolvable for a while. Thank for your time on this. Much appreciated.
@dotdc - Just wanted to follow-up that I was able to get the cadvisor deployed as a daemonset working. What I was missing with the last attempt was relabelings on the job doing the daemonset scrape. Plus it seemed an extras space in the whitelisting of labels in the kustomize example they provide caused an issue too.
Within my Kube-Prometheus-Stack Helm chart / values.yaml section, I needed to add:
additionalScrapeConfigs:
# CADVISOR SCRAPE JOB for externally installed cadvisor because of k8s with containerd problems
- job_name: "kubernetes-cadvisor"
kubernetes_sd_configs:
- role: pod # we get needed info from the pods
namespaces:
names:
- cadvisor
selectors:
- role: pod
label: "app=cadvisor" # and only select the cadvisor pods with this label set as source
metric_relabel_configs: # we relabel some labels inside the scraped metrics
# this should look at the scraped metric and replace/add the label inside
- source_labels: [container_label_io_kubernetes_pod_namespace]
target_label: "namespace"
- source_labels: [container_label_io_kubernetes_pod_name]
target_label: "pod"
- source_labels: [container_label_io_kubernetes_container_name]
target_label: "container"
Now I have image=
, container=
, and pod=
values:
And your dashboard lights up as expected:
This is only on my test cluster so far, but still a great sign. Still need to do some more testing and apply a bunch of drop labels to match how kubelet cadavisor is handled.
Hi @reefland,
Really happy to learn that you finally managed to fix this and able to use the new version of the dashboard! :partying_face: Thanks for sharing your settings, it may help someone using a similar setup!
Wish you a happy holiday season!
Thank you!
Another update... I yanked the external cAdvisor daemonset. Prometheus started to constantly complain about duplicate, out of order and dropped labels. Hundreds per second. It just felt like I was digging a deeper and deeper hole.
So instead node by node, I yanked the external containerd / runc / ZFS snapshotter. I moved the /var/lib/rancher
to the side, added a 30GB ZVOL formatted with XFS and mounted it to /var/lib/rancher
. Removed the tweaks added to the k3s
and k3s-agent
services for an external containerd. And copied back in the directory structure.
Was able to swap out all this without a reinstall and no down time. I'm luv'n Kubernetes :)
So I still have all the pluses of ZFS on bare metal (mirrors, compression, encryption, snapshots, boot environments, rollbacks) and the one important directory for K3s is now on a ZFS backed (mirrored / compression / encrypted) XFS filesystem (which allows easy volume expansion in the future if I need it). And K3S can use its default overlayfs and all the metrics just magically work.
Even dashboards from Mixin which have never worked for me, started to light up within seconds of the first node being converted:
I'm monitoring my system logs, everything looks great, Prometheus is happy. All my dashboards are working as expected with a standardish K3S "default" install.