multus-cni icon indicating copy to clipboard operation
multus-cni copied to clipboard

Failed To Setup Network For Sandbox After Deploying Multus

Open dcplaya opened this issue 1 year ago • 6 comments

I have a cluster setup with Cilum as my main CNI. I am trying to add Multus to allow specific pods to have multiple interfaces (to hopefully fix some multicasting issues).

I deployed Multus via the thick method daemonset. Before I configure any pod to actually use Multus, I can kill and reschedule any pod that has a networking interface and the following error occurs. The pod refuses to start. A similar error happens on ALL pods, not just ones that are configured to use Multus.

Sorry I dont have this in text format, I quickly grabbed a screenshot and then started recovering my cluster by removing Multus and the conf file it creates.

image

Is this an issue with Multus & Cilium running together? Or do I have some type of configuration error somewhere?

dcplaya avatar Sep 23 '22 08:09 dcplaya

Could you please share following info:

  • which container image is used (i.e. container image URL) and
  • pod yaml file (of multus pod, in kube-system namespace)

s1061123 avatar Sep 28 '22 16:09 s1061123

I used the thick daemonset from here

  • Container used: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick
  • yaml, see above. I made no changes to the yaml supplied.

dcplaya avatar Sep 28 '22 17:09 dcplaya

Hmm... it looks like multus-daemon failed to get pod network namespace inode from the pod. Could you please check '/var/run/netns/cni-....' accessibility? (you can find it from last message from above), from multus-daemon pod?

s1061123 avatar Sep 29 '22 11:09 s1061123

I've noticed similar behaviour where no pods were coming online with the current master. Multus seems to be up and running but pod creation just hangs. Using Calico instead.

The 3.9.1 release is also broken but for other reasons. It seems to be using the wrong image tag in the YAML so Multus doesn't even come online itself with this release; it's set to use thick, which does not contain the generate-kubeconfig binary for one. Looks to be related to #918
Using the 3.9.1 yaml and updating the tag to either stable or v3.9.1 creates a Multus deployment that runs and other pods start coming online fine.

File permissions on /var/run/netns on my end are 750. There are no underlying cni-* files/directories, just files (UUID names) with mode 444.

Omar007 avatar Sep 30 '22 08:09 Omar007

If your thick means container image tag, such as :thick, please update to ghcr.io/k8snetworkplumbingwg/multus-cni:v3.9.1-thick-amd64. As you mentioned above, the issue is same to #918.

s1061123 avatar Sep 30 '22 11:09 s1061123

I'll need to set up a test cluster to get those logs for you. Give me a few days

dcplaya avatar Sep 30 '22 12:09 dcplaya

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 30 '22 02:12 github-actions[bot]