lokomotive icon indicating copy to clipboard operation
lokomotive copied to clipboard

Networking dashboard accounts host traffic as pods running with "hostNetwork: true" traffic

Open surajssd opened this issue 3 years ago • 5 comments

Pod that runs with hostNetwork: true reports wrong traffic (network usage) metrics. It reports traffic that is passing through the node. Now we might conclude that it is doing the right thing. But essentially it is reporting wrong data.

So either figure out a way to differntiate the node traffic from individual pod's traffic running on host network OR stop reporting traffic of such pods on dashboards. The dashbaord that shows aggregate traffic of all pods in a namespace, Kubernetes/ Networking / Namespaces (Pods) , shows wrong aggregate due to this issue.

surajssd avatar Oct 07 '20 08:10 surajssd

Hm, that doesn't seems like a Lokomotive issue, but rather a monitoring mixin issue.

Does it mean every pod running on host network reports the same network usage (as host network usage) ?

invidian avatar Oct 07 '20 09:10 invidian

Hm, that doesn't seems like a Lokomotive issue, but rather a monitoring mixin issue.

It is a kubelet issue, since it is the one generating these metrics. And then it is a mixin issue to filter that out for a hostnetwork pod. Also we need a place to track this in lokomotive since this config is shipped as part of lokomotive.

Does it mean every pod running on host network reports the same network usage (as host network usage) ?

Yep.

surajssd avatar Oct 07 '20 11:10 surajssd

I think we can at least identify at the metric level which metric is coming from hostnetwork pod. So the hostnetwork pods have all the interfaces in their metric, including interfaces like bond0:

container_network_receive_bytes_total{container="POD",endpoint="https-metrics",id="/kubepods/besteffort/poded104e4d-6005-4821-9c21-8515da74ea12/4190f6acdef6c1a4276079bc6e0bf181473ad574ac2f3015e0d74a27e94227d3",
image="k8s.gcr.io/pause:3.2",instance="10.88.72.131:10250",interface="bond0",job="kubelet",
metrics_path="/metrics/cadvisor",name="k8s_POD_csi-rbdplugin-fdfvb_rook_ed104e4d-6005-4821-9c21-8515da74ea12_0",
namespace="rook",node="suraj-lk-cluster-general-worker-0",pod="csi-rbdplugin-fdfvb",
service="prometheus-operator-kubelet"}

But those that don't have hostnetwork will have only two interfaces which are eth0 and tunl0:

container_network_receive_bytes_total{container="POD",endpoint="https-metrics",id="/kubepods/besteffort/pod407469ee-c455-4cc8-9094-402dfc736732/58246f6880847be208b4d718cc48666b108d32611f418f7a6693e9d32520e5c7",
image="k8s.gcr.io/pause:3.2",instance="10.88.72.129:10250",interface="eth0",job="kubelet",
metrics_path="/metrics/cadvisor",name="k8s_POD_kube-controller-manager-5f6bc6b89b-pzdvv_kube-system_407469ee-c455-4cc8-9094-402dfc736732_2",
namespace="kube-system",node="suraj-lk-cluster-controller-0",pod="kube-controller-manager-5f6bc6b89b-pzdvv",
service="prometheus-operator-kubelet"}	

surajssd avatar Oct 08 '20 09:10 surajssd

Have a similar issue. @surajssd did you manage to apply a mixin to filter the traffic? How did you achieve it?

eugenesiow avatar Jun 13 '22 01:06 eugenesiow

@eugenesiow nope I could not figure out how to fix this issue. The inherent problem being the pods running on host network are sharing the networking namespace with everything that's running on that host. So not sure how to differentiate traffic coming from specific pod!

surajssd avatar Jun 20 '22 10:06 surajssd