multus-cni icon indicating copy to clipboard operation
multus-cni copied to clipboard

Race issue after node reboot

Open SchSeba opened this issue 6 months ago • 12 comments

Hi, it looks like there is an issue after a node reboot where we can have a race in multus that will prevent the pod from starting

kubectl -n kube-system logs -f kube-multus-ds-ml62q -c install-multus-binary
cp: cannot create regular file '/host/opt/cni/bin/multus-shim': Text file busy

The problem is mainly after reboot that the multus-shim gets called by crio to start pods but the multus pod is not able to start because the init container fails to cp the shim. The reason it failed to copy is because crio called the shim who is stuck waiting for the communication with the pod

[root@virtual-worker-0 centos]# lsof /opt/cni/bin/multus-shim
COMMAND    PID USER  FD   TYPE DEVICE SIZE/OFF     NODE NAME
multus-sh 8682 root txt    REG  252,1 46760102 46241656 /opt/cni/bin/multus-shim
[root@virtual-worker-0 centos]# ps -ef | grep mult
root        8682     936  0 16:27 ?        00:00:00 /opt/cni/bin/multus-shim
root        9082    7247  0 16:28 pts/0    00:00:00 grep --color=auto mult

SchSeba avatar Feb 01 '24 17:02 SchSeba