Pods fail to start with annotation "kubernetes.io/egress-bandwidth"
Pods with annotation kubernetes.io/egress-bandwidth: 10M fail to start with Network Observability Operator 1.6.2 installed. Pod events show:
...failed to create pod network sandbox k8s_php-sample-6cfff549d-7fvw5_mywebapp_88fa15ea-5251-4931-99f0-9c021f2f34a9_0(ebbdf6643f2ad7cf4b6cd0c82f7008db13219987206fb54d46355865b6e7aeda): error adding pod mywebapp_php-sample-6cfff549d-7fvw5 to CNI network "multus-cni-network"...
Which raises the question: Are there OS requirements for nodes?
The above failure occurs on OpenShift 4.14.34 with (AMD64) nodes at:
sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.10 (Ootpa)
sh-4.4# uname -a
Linux kube-cotssgfw0jdq7e85d7sg-lsprototype-default-000002a3 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Mar 14 14:20:09 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4#
The failure does not occur on OpenShift 4.14.27 with nodes at:
sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa)
sh-4.4# uname -a
Linux worker0.paul-network-metrics.cp.fyre.ibm.com 5.14.0-284.66.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 6 14:51:27 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4#
Hi @paulben ,
Thanks for reporting this issue. Do you know which CNI is implementing this rate limiting annotation? Is it calico? Asking because we've already been made aware of a limitation when a similar annotation was used with Calico while netobserv is used - there is a conflict with the eBPF programs. As far as I can tell, the program loaded by netobserv should support chaining with other BPF programs, but that might not be the case of the other one that is loaded. We might also need to ask collaboration with the folk maintaining this upstream, if this is what I suspect.
cc @msherif1234 - we need to see if we must create an issue upstream in containernetworking.
@paulben do the 2 clusters that you mention have a similar network configuration regarding CNIs / multus?
@jotak On the failing cluster:
$ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
Calico
On the "working" cluster:
$ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
OVNKubernetes
I'm not sure how to get further cni/multus config. Can you advise?
I don't see anything we can do on our side, it's on calico side / container plugins to allow other probes to run. But in openshift 4.16, the problem should be solved because it provides new TCx hooks that better handle this sort of conflict. Though we haven't tested with Calico & bandwidth annotations, but it would be good to check.