bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

cAdvisor metrics are missing metadata information

Open etungsten opened this issue 4 years ago • 5 comments

Image I'm using: bottlerocket-aws-k8s-1.15-x86_64-0.3.1-8a0c0b3 with a v1.15 EKS control plane

What I expected to happen: cAdvisor metrics to have associated metadata available (e.g container_name, pod, namespace) For example, container_fs_usage_bytes for an AL2 based EKS node:

container_fs_usage_bytes{container="kube-proxy",container_name="kube-proxy",device="/dev/nvme0n1p1",id="/kubepods/burstable/pod69bed0f7-1b10-4da6-bf06-2193e6e6f2aa/955dc369c1eec8f5f00e74198f76d0b70983ed3649b9055aac5dd4e9ed9c2c66",image="602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy@sha256:d3a6122f63202665aa50f3c08644ef504dbe56c76a1e0ab05f8e296328f3a6b4",name="k8s_kube-proxy_kube-proxy-p978c_kube-system_69bed0f7-1b10-4da6-bf06-2193e6e6f2aa_0",namespace="kube-system",pod="kube-proxy-p978c",pod_name="kube-proxy-p978c"} 12288 1584990960301

What actually happened: cAdvisor metrics all have missing metadata information in an Bottlerocket node.

container_fs_usage_bytes{container="",container_name="",device="/dev/nvme0n1p10",id="/",image="",name="",namespace="",pod="",pod_name=""} 438272 1584990076136
...
container_cpu_usage_seconds_total{container="",container_name="",cpu="total",id="/",image="",name="",namespace="",pod="",pod_name=""} 4199.250219446 1584990076136

How to reproduce the problem:

  1. Launch Bottlerocket nodes in your cluster.
  2. Start a HTTP proxy to the Kuberenetes API server and access the metrics under http://localhost:8001/api/v1/nodes/$NODE_NAME/proxy/metrics/cadvisor Substitute $NODE_NAME with the name of a bottlerocket node.

etungsten avatar Mar 23 '20 21:03 etungsten

With #868, majority of the metrics have regained metadata information. However a lot of filesystem and io related metrics are still missing metadata. (e.g. container_fs_limit_bytes, container_fs_io_time_seconds_total)

etungsten avatar Mar 24 '20 16:03 etungsten

Just expanding on the above comment, our team noticed all of these metrics:

"container_fs_inodes_free"
"container_fs_inodes_total"
"container_fs_io_current"
"container_fs_io_time_seconds_total"
"container_fs_io_time_weighted_seconds_total"
"container_fs_limit_bytes"
"container_fs_read_seconds_total"
"container_fs_reads_merged_total"
"container_fs_sector_reads_total"
"container_fs_sector_writes_total"
"container_fs_usage_bytes"
"container_fs_write_seconds_total"
"container_fs_writes_merged_total"

are lacking metadata and some of these are used for autoscaling and other behavior in some of our workloads.

baasumo avatar Apr 14 '22 19:04 baasumo

@baasumo Thanks for following up on this!

zmrow avatar Apr 14 '22 22:04 zmrow

Support for these metrics is still not in cAdvisor upstream, per https://github.com/google/cadvisor/issues/2785#issuecomment-1020088355.

bcressey avatar Apr 29 '22 22:04 bcressey

Is there a way we can get labels and annotations from the pod running these containers?

ghost avatar May 23 '22 12:05 ghost

It appears we were able to improve this slightly with the referenced issue above, but from what I can tell the full fix for this is actually something outside of Botttlerocket. I'm going to close this since we don't have any concrete things identified in Bottlerocket, but feel free to reopen if there is anything else that can be done from this end.

stmcginnis avatar Dec 19 '22 17:12 stmcginnis