bottlerocket
bottlerocket copied to clipboard
cAdvisor metrics are missing metadata information
Image I'm using: bottlerocket-aws-k8s-1.15-x86_64-0.3.1-8a0c0b3 with a v1.15 EKS control plane
What I expected to happen:
cAdvisor metrics to have associated metadata available (e.g container_name, pod, namespace)
For example, container_fs_usage_bytes
for an AL2 based EKS node:
container_fs_usage_bytes{container="kube-proxy",container_name="kube-proxy",device="/dev/nvme0n1p1",id="/kubepods/burstable/pod69bed0f7-1b10-4da6-bf06-2193e6e6f2aa/955dc369c1eec8f5f00e74198f76d0b70983ed3649b9055aac5dd4e9ed9c2c66",image="602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy@sha256:d3a6122f63202665aa50f3c08644ef504dbe56c76a1e0ab05f8e296328f3a6b4",name="k8s_kube-proxy_kube-proxy-p978c_kube-system_69bed0f7-1b10-4da6-bf06-2193e6e6f2aa_0",namespace="kube-system",pod="kube-proxy-p978c",pod_name="kube-proxy-p978c"} 12288 1584990960301
What actually happened: cAdvisor metrics all have missing metadata information in an Bottlerocket node.
container_fs_usage_bytes{container="",container_name="",device="/dev/nvme0n1p10",id="/",image="",name="",namespace="",pod="",pod_name=""} 438272 1584990076136
...
container_cpu_usage_seconds_total{container="",container_name="",cpu="total",id="/",image="",name="",namespace="",pod="",pod_name=""} 4199.250219446 1584990076136
How to reproduce the problem:
- Launch Bottlerocket nodes in your cluster.
-
Start a HTTP proxy to the Kuberenetes API server and access the metrics under
http://localhost:8001/api/v1/nodes/$NODE_NAME/proxy/metrics/cadvisor
Substitute$NODE_NAME
with the name of a bottlerocket node.
With #868, majority of the metrics have regained metadata information. However a lot of filesystem and io related metrics are still missing metadata. (e.g. container_fs_limit_bytes
, container_fs_io_time_seconds_total
)
Just expanding on the above comment, our team noticed all of these metrics:
"container_fs_inodes_free"
"container_fs_inodes_total"
"container_fs_io_current"
"container_fs_io_time_seconds_total"
"container_fs_io_time_weighted_seconds_total"
"container_fs_limit_bytes"
"container_fs_read_seconds_total"
"container_fs_reads_merged_total"
"container_fs_sector_reads_total"
"container_fs_sector_writes_total"
"container_fs_usage_bytes"
"container_fs_write_seconds_total"
"container_fs_writes_merged_total"
are lacking metadata and some of these are used for autoscaling and other behavior in some of our workloads.
@baasumo Thanks for following up on this!
Support for these metrics is still not in cAdvisor upstream, per https://github.com/google/cadvisor/issues/2785#issuecomment-1020088355.
Is there a way we can get labels and annotations from the pod running these containers?
It appears we were able to improve this slightly with the referenced issue above, but from what I can tell the full fix for this is actually something outside of Botttlerocket. I'm going to close this since we don't have any concrete things identified in Bottlerocket, but feel free to reopen if there is anything else that can be done from this end.