k8s.pod.network.io gives data only from eth0
Component(s)
receiver/kubeletstats
What happened?
Description
I have been trying to get network io details per namespace. Comparing the metrics of k8s.pod.network.io with cadvisor - container_network_receive_bytes_total I found that kubeletstatsreceiver does not return data for interfaces other than eth0 . This provides incomplete picture around networking bandwidth utilized.
Steps to Reproduce
sum(rate(k8s_pod_network_io{k8s_namespace_name="$k8s_namespace_name", direction="receive"}[5m])) by (interface)
vs
sum(rate(container_network_receive_bytes_total{namespace="$k8s_namespace_name"}[5m])) by (interface)
Expected Result
I should see data from all interfaces from k8s_pod_network_io stream.
Actual Result
got results only for eth0
Collector version
v0.90.1
Environment information
Environment
Amazon EKS
OpenTelemetry Collector configuration
receivers:
kubeletstats:
collection_interval: 15s
auth_type: "serviceAccount"
endpoint: "https://${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
extra_metadata_labels:
- container.id
- k8s.volume.type
metric_groups:
- node
- pod
- container
- volume
metrics:
k8s.pod.cpu_limit_utilization:
enabled: true
k8s.pod.cpu_request_utilization:
enabled: true
k8s.pod.memory_limit_utilization:
enabled: true
k8s.pod.memory_request_utilization:
enabled: true
Log output
No response
Additional context
No response
Pinging code owners:
- receiver/kubeletstats: @dmitryax @TylerHelmuth
See Adding Labels via Comments if you do not have permissions to add labels yourself.
@prabhatsharma I believe you're right, thank you for bringing this to our attention.
In https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/internal/kubelet/network.go we use the bytes provided at the root of https://pkg.go.dev/k8s.io/kubelet/pkg/apis/stats/v1alpha1#NetworkStats. If we wanted to recorded all the interface stats I believe we'd need to loop through the Interfaces slice.
If we did this I believe it would add an extra dimension to the datapoints we produced for interface name. I believe it would also be a breaking change.
@dmitryax @povilasv @jinja2 what are your thoughts?
This issue makes sense, looks like we are collecting only default network stats:
// Stats for the default interface, if found
// [InterfaceStats](https://pkg.go.dev/k8s.io/kubelet/pkg/apis/stats/v1alpha1#InterfaceStats) `json:",inline"`
It would make sense to add interface name as an extra dimension.
Regarding breaking change, we could make a featureflag?
Definitely a featureflag.
Also I think we're in luck: the existing metric already defines interface as an attribute so I think we could do this without breaking the existing metric. The breaking change would only be from new default metrics.
I believe this issue is also impacting the k8s.node.network.* metrics. We are only reporting the default interface for node networkstats. The default interface is hardcoded to eth0 in kubelet it seems, so for setups not using eth0 interface name might not be seeing any network metrics with the receiver. I can look more into this if nobody has started work on it.
I am wondering if we need to have any additional logic for pods which run in hostNetwork since these would have all host network interfaces show up which can blow up in cardinality and the values might not even make sense since it is for complete host traffic.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/kubeletstats: @dmitryax @TylerHelmuth
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/kubeletstats: @dmitryax @TylerHelmuth
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I believe this issue is also impacting the
k8s.node.network.*metrics. We are only reporting the default interface for node networkstats. The default interface is hardcoded toeth0in kubelet it seems, so for setups not using eth0 interface name might not be seeing any network metrics with the receiver. I can look more into this if nobody has started work on it.I am wondering if we need to have any additional logic for pods which run in hostNetwork since these would have all host network interfaces show up which can blow up in cardinality and the values might not even make sense since it is for complete host traffic.
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33993 reports the issue for the k8s.node.network.* metric. Any specific reason for https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30626 not making it? Was that just left stale or there was another blocking reason?
I don't believe there was any blocking reason. We still want this, but it has been hard to prioritize.
I don't believe there was any blocking reason. We still want this, but it has been hard to prioritize.
Revived that at https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34287. PTAL
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark
See Adding Labels via Comments if you do not have permissions to add labels yourself.
What's the status in here? I see the original ticket (https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33993) was closed.