opentelemetry-collector-contrib [receiver/kubeletstats] Add `k8s.container.cpu.node_limit

trafficstars

Description: <Describe what has changed.>

At the moment. We calculate the k8s.container.cpu_limit_utilization as a ratio of the container's limits at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/867d6700c31446172e6998e602c55fbf7351831f/receiver/kubeletstatsreceiver/internal/kubelet/cpu.go#L30.

Similarly we can calculate the cpu utilization as ratio of the whole node's allocatable cpu, if we divide by the total number of node's cores.

We can retrieve this information from the Node's Status.Capacity, for example:

$ k get nodes kind-control-plane -ojsonpath='{.status.capacity}'
{"cpu":"8","ephemeral-storage":"485961008Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"32564732Ki","pods":"110"}

Performance concerns

In order to get the Node's capacity we need an API call to the k8s API in order to get the Node object. Something to consider here is the performance impact that this extra API call would bring. We can always choose to have this metric as disabled by default and clearly specify in the docs that this metric comes with an extra API call to get the Node of the Pods.

The good thing is that kubeletstats receiver target's only one node so I believe it's a safe assumption to only fetch the current node because all the observed Pods will belong to the one single local node. Correct me if I miss anything here.

In addition, instead of performing the API call explicitly on every single scrape we can use an informer instead and leverage its cache. I can change this patch to this direction if we agree on this.

Would love to hear other's opinions on this.

Todos

✅ 1) Apply this change behind a feature gate as it was indicated at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885#issuecomment-2037784116 ✅ 2) Use an Informer instead of direct API calls.

Link to tracking Issue: <Issue number if applicable> ref: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885

Testing: <Describe what testing was performed and which tests were added.>

I experimented with this approach and the results look correct. In order to verify this I deployed a stress Pod on my machine to consume a target cpu of 4 cores:

apiVersion: v1
kind: Pod
metadata:
  name: cpu-stress
spec:
  containers:
  - name: cpu-stress
    image: polinux/stress
    command: ["stress"]
    args: ["-c", "4"]

And then the collected container.cpu.utilization for that Pod's container was at 0,5 as exepcted, based that my machine-node comes with 8 cores in total:

cpu-stress

Unit test is also included.

Documentation: <Describe the documentation added.>

Added: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295/files#diff-8ad3b506fb1132c961e8da99b677abd31f0108e3f9ed6999dd96ad3297b51e08

Apr 10 '24 11:04 ChrsMark

@TylerHelmuth @jinja2 this one should be ready for review now. Feel free to give it a shot when you find the time.

Apr 17 '24 13:04 ChrsMark

Thank's @TylerHelmuth , I'm a huge fun of standalone+smaller PRs however this one does not really apply any actual refactoring other than the move of the NewNodeSharedInformer to a common place 🤔 . Then it only extends the current kubeletstats receiver's internal functions in order to make the node's limit information available. But still, I would like to hear if you have any preference/suggestion on how to break it and I will split accordingly.

Apr 18 '24 07:04 ChrsMark

@ChrsMark ya I'd prefer that the NewNodeSharedInformer stuff be moved to its own PR. Doing that will keep us aligned with our contributing guidelines.

Apr 18 '24 15:04 TylerHelmuth

I have updated the PR after the standalone refactoring. Note that for now the feature gate only applies on container.cpu.utilization but I suggest we make it more generic to apply on all cpu.utilization metrics as part of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885. This will allow us to change pod.cpu.utilization and node.cpu.utilization accordingly, using the same feature flag, in follow-up PRs. Let me know what you think @TylerHelmuth.

Apr 29 '24 09:04 ChrsMark

After the discussions at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885 I have updated the PR to introduce a new metric instead of keeping the old one. In this, the container.cpu.utilization will be deprecated as it is decided already and this PR adds the k8s.container.cpu.node_limit_utilization to provide the cpu percentage against the total Node's capacity.

@TylerHelmuth @dmitryax could you take another look please?

Note: for some reason the top level test files of the receiver's are ignored and hence I haven't updated the golden file yet. Actually the tests should fail since the referenced golden file does not exist at all right now. Update: This is because of https://github.com/open-telemetry/opentelemetry-collector/pull/10126.

May 09 '24 13:05 ChrsMark

Thank's @TylerHelmuth . I have added the setting validation in the config.go. Please take another look.

May 14 '24 13:05 ChrsMark

@TylerHelmuth @dmitryax thank's folks. Your comments should be addressed now.

May 15 '24 09:05 ChrsMark

@dmitryax @povilasv please give this another look

May 22 '24 15:05 TylerHelmuth

It looks like we'd be introducing more combinations of utilization metric for different resource types cpu/memory/ephemeral_storage for both k8s.container and k8s.pod. Are pre-computed utilization metric for different limits like we have in this receiver for container, pod, and now node useful to users?

I understand one of the arguments for doing the utilization calculation in this receiver has been due to users wanting all resource usage metrics from the same receiver, and currently users need to run both the kubeletstats (reports usage for container) and the k8scluster (reports node's capacity/allocatable) receiver to be able to calculate the utilization. Would we instead consider reporting the node's capacity metric from the kubeletstats receiver instead of doing utilization calculations in the receiver? With the node's capacity/allocatable and pod's/container's resource usage (have to make sure the node id is available for filtering) being reported from kubeletstats receiver, the user has the option to calculate the utilization in their backend observability platform. Imo, this gives user more flexibility in deciding what they consider as limit, or whether they want to track it per container or per pod

May 29 '24 16:05 jinja2

Are pre-computed utilization metric for different limits like we have in this receiver for container, pod, and now node useful to users?

From my point of view, yes :)

Would we instead consider reporting the node's capacity metric from the kubeletstats receiver instead of doing utilization calculations in the receiver?

Wouldn't that mean that we will have the same metric potentially emitted from 2 different receivers at the same time?

In general, that's mostly covered already back in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24905#issuecomment-1697734148 when the k8s.container.cpu_limit_utilization and the rest were proposed and added.

From my perspective agreeing on unambiguous metrics (even as optional) that we can provide ootb is the goal here. If we pass the responsibility to users/backends to do such calculations/namings then we kind of start losing the vendor/backend agnostic idea.

Note: the k8s.pod.cpu_limit/request_utilization metrics are not provided at all if there are containers with no limit/request set. That means that the utilization metric as ratio against the node's "limit" is the one that will be always there.

May 29 '24 17:05 ChrsMark

Based on the discussions at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295#discussion_r1618343734 I have pushed a change to rename the metric to k8s.container.cpu.node.utilization. That's better aligned with the SemConv guidelines, and also preserves the k8s.container.cpu.node.* as a namespace in case we want to add an additional metric in the future to cover the allocatable related ratio.

@andrzej-stencel @povilasv @TylerHelmuth @dmitryax @jinja2 please take another look.

May 30 '24 07:05 ChrsMark

Wouldn't that mean that we will have the same metric potentially emitted from 2 different receivers at the same time?

I think it is acceptable to have some metrics be reported by both k8scluster and kubeletstats receiver. k8scluster receiver is run as a deployment so usually only one instance of the receiver is doing the watching/reporting for all nodes/pods in cluster. We have seen missed updates with massive clusters or clusters with higher churn rate, etc. kubeletstats receiver reporting the limit/request and/or node capacity/allocatable for the local node and pods would be a beneficial addition for users who do not want to run both receivers if they are only looking for metrics related to resource usage, and also a more scalable alternative for k8scluster receiver.

From my perspective agreeing on unambiguous metrics (even as optional) that we can provide ootb is the goal here. If we pass the responsibility to users/backends to do such calculations/namings then we kind of start losing the vendor/backend agnostic idea.

Tbh, I don't think this pre-computed metric adds much value for k8s users. If the goal is to make sure that users have the option of comparing usage against node's capacity then reporting just the capacity instead of the pre-computed utilization is a more flexible option, because it allows k8s admins and users to calculate "utilization" as they see fit for their k8s setup. I can think of a few scenarios where a k8s admin would calculate usage against node's cap/allo but these are not in-line with the metric being added here. For e.g., when looking at a node's utilization to determine whether an admin needs to add more capacity, I am more likely to sum the resource requests of all pods on the node and compare that against the node's allocatable instead of looking at the actual usage of the containers, which kind of does not matter because even if the container is using only 1 Gi out of a 10Gi request, the node has still reserved that 10Gi and is not usable for any other container on the node. As an application developer deploying to a k8s cluster, I am more likely to see the utilization wrt my pod's request/limit instead of the node's capacity. And if I have not set a limit on my container, the limit of resource such a container is technically allowed to use is the nodes allocatable - sum(request of all other pods on this node). These are just some examples for how a k8s user/admin might think of a limit as imposed by the node's availability of resource. I agree that there is probably a good usecase to measure usage of a container's cpu wrt to node's capacity, but there are plenty of such combinations when we try to compute utilization wrt k8s node's available resource which might be equally or more useful imo, and a good way to accommodate these different scenarios is to keep the metrics more flexible instead of pre-computing.

May 30 '24 19:05 jinja2

I agree that there is probably a good usecase to measure usage of a container's cpu wrt to node's capacity, but there are plenty of such combinations when we try to compute utilization wrt k8s node's available resource which might be equally or more useful imo, and a good way to accommodate these different scenarios is to keep the metrics more flexible instead of pre-computing.

Thank's for the additional information!

I guess users are always able to apply such calculations in their back-ends if they want to, even with the metrics that are provided today. In this, I don't see all these ideas really being blockers for this optional metric.

For reference, this one has been used in Kibana for years now so being able to retrieve this ootb through the Collector would only be of benefit. Having the utilization calculated against a hard limit is a valuable, non relative, indicator to observe over time, build alerts/reports on top of etc. It can also come handy in order to quickly drill down to specific workloads while investigating infra resource utilization issues.

Something to not underestimate is that query aggregations like those mentioned can become "expensive" sometimes, so I see value in having the option to do such pre-calculations on the edge. I would be more than happy to see any additional one, that we see as appropriate, in the future.

Ultimately, providing this kind of metrics as optional is flexible enough since users that want to calculate them on their back-ends they can do so, while those that want them pre-calculated on the edge will still have the option for that.

May 31 '24 07:05 ChrsMark

I agree with @ChrsMark that this metric is valuable.

The discussion around naming and exact meaning (computing vs. node's capacity or allocatable cpu) would have been much easier if the semantic conventions were already specified for k8s. However, I don't think we should block adding new useful metrics to the collector until there are semantic conventions for them first. This is also not what we have been doing historically.

Ideally, I would have each metric created by this receiver (and other receivers) document its semantic conventions status, or whether a semantic convention even exists for the metric. That would make it easier for users to reason about a specific metric's stability. Given that the receiver as a whole is in beta stability, we would like users to be aware of the fact that not all metrics in the component might be equally stable. If we'd want this to be done, that would be a separate PR that should not block this one or others.

May 31 '24 10:05 andrzej-stencel

opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard

[receiver/kubeletstats] Add `k8s.container.cpu.node_limit_utilization` metric

Performance concerns

Todos

opentelemetry-collector-contrib opentelemetry-collector-contrib copied to clipboard

[receiver/kubeletstats] Add `k8s.container.cpu.node_limit_utilization` metric

Performance concerns

Todos

opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard