converged-edge-experience-kits
converged-edge-experience-kits copied to clipboard
kube-OVN/OVS Networking issue
We created single node cluster with latest release 20.12.02 as it was running fin.We deployed the workload successfully. but, after few days when we tried to deploy more workloads it gets stuck in "ContainerCreating" state showing the following CNI related failure.
Warning FailedCreatePodSandBox 86s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "8b0ce8e6589f5fbfab6b911896fc8e1b30a418ddf1b765ca8e699c9e10e405d4" network for pod "nginx-app-deployment-6cdc9c97d4-slztv": networkPlugin cni failed to set up pod "nginx-app-deployment-6cdc9c97d4-slztv_default" network: request ip return 500 configure nic failed add nic to ovs failed failed to run 'ovs-vsctl --timeout=30 --may-exist add-port br-int 8b0ce8e6589f_h -- set interface 8b0ce8e6589f_h external_ids:iface-id=nginx-app-deployment-6cdc9c97d4-slztv.default external_ids:pod_name=nginx-app-deployment-6cdc9c97d4-slztv external_ids:pod_namespace=default external_ids:ip=10.16.13.172': exit status 1
"ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)\n": "", failed to clean up sandbox container "8b0ce8e6589f5fbfab6b911896fc8e1b30a418ddf1b765ca8e699c9e10e405d4" network for pod "nginx-app-deployment-6cdc9c97d4-slztv": networkPlugin cni failed to teardown pod "nginx-app-deployment-6cdc9c97d4-slztv_default" network: delete ip return 500 {
"protocol": "",
"address": "",
"mac_address": "",
"cidr": "",
"gateway": "",
"mtu": 0,
"error": "del nic failed failed to delete ovs port failed to run 'ovs-vsctl --timeout=30 --if-exists --with-iface del-port br-int 8b0ce8e6589f_h': exit status 1\n \"ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)\\n\", \"\""
}]
Normal SandboxChanged 9s (x7 over 85s) kubelet Pod sandbox changed, it will be killed and re-created.
This seems like OVN/OVS networking issue. attaching the openvswitch logs. Let me know what other information is required to root cause the issue ovs_service_logs.zip
Could you please try after rebooting the node?