bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Missing runtime metrics from cAdvisor

Open stevehipwell opened this issue 1 year ago • 2 comments

Platform I'm building on:

EKS

What I expected to happen:

I'd expect to see the cAdvisor runtime values as part of the kubelet metrics.

What actually happened:

The kubelet cAdvisor metrics are missing the runtime values meaning we can't see the node usage.

How to reproduce the problem:

Run kubectl get --raw "/api/v1/nodes/${NODE_NAME}/proxy/stats/summary" | jq -r '.node.systemContainers[] | .name' and you'll see the following response.

kubelet
pods

This the same as https://github.com/awslabs/amazon-eks-ami/issues/1667; the solution I added there would also seem to be relevant in the context of Bottlerocket.

stevehipwell avatar Feb 14 '24 18:02 stevehipwell

Hi @stevehipwell , thanks for letting us know! Let me give your solution a try and confirm. Just to keep a record, at some point we moved both containerd.service and kubelet.service to be under the runtime.slice cgroup. It looks like cadvisor may need to know which cgroups it should track in order to provide the metrics:

https://github.com/kubernetes/kubernetes/blob/ad6477e342c8ce0f9b1997d5345322c930f6911d/cmd/kubelet/app/server.go#L722

arnaldo2792 avatar Feb 15 '24 23:02 arnaldo2792

@arnaldo2792 thanks. I've not had a chance to check out the actual code in detail but I'm surprised this doesn't just work. The kubelet should have all of the information required to automate this.

stevehipwell avatar Feb 16 '24 08:02 stevehipwell

Hi @stevehipwell, thanks for the issue. I had a chance this afternoon to repro as well as attempt your suggested fix, and it worked :D I'm going to test some more but I'll get a PR when ready

ginglis13 avatar Mar 01 '24 00:03 ginglis13

@arnaldo2792 and @ginglis13 is this related somehow to #2743?

webern avatar Mar 18 '24 19:03 webern

Yes it is

arnaldo2792 avatar Mar 18 '24 20:03 arnaldo2792