easy-nmos icon indicating copy to clipboard operation
easy-nmos copied to clipboard

Unable to get easy-nmos running on Mellanox NIC

Open GeorgeGeorgakisMediaproxy opened this issue 1 year ago • 1 comments

I've been struggling to get easy-nmos to run on the Mellanox interface of a VM running inside VMware Workstation.

For whatever reason, if the docker container is configured to point at the Mellanox then no connections can be made from either the VM or other systems on the isolated Mellanox subnet (10.0.0.0/8 with no gateway). Some things already tried:

  • Start VM with only a Intel (LAN) NIC present and with the container IPs set in the LAN subnet - this always works fine.
  • Start VM with Intel and Mellanox NICs present and with the container IPs set in the Mellanox subnet - container NICs cannot be pinged.
  • Start VM with only Mellanox NIC present and with container IPs set within Mellanox subnet - container NICs cannot be pinged.

The VM's IP on the Mellanox interface is 10.90.0.2, and when the VM is up this IP can be pinged from anywhere in the Mellanox subnet. So it's not a generic networking problem with the Mellanox interface. It's only a problem with getting the container to work with that interface.

There are no errors during container start-up, so the presumption is that this is a subtle configuration issue.

As per comments in another case, I tried setting promisc mode on the Mellanox interface within the VM, and adding a pseudo-gateway of 10.90.0.2 in docker-compose.yaml, but again without success.

Not sure where to go from here. Assistance would be greatly appreciated, please let me know if further details are needed.

Unfortuantely this is really beyond the scope of this work. This appears to be a networking or configuration issue here on the host side.

How are you presenting the Mellanox interface to the Virtual Machine? Are you using a para-virt driver or something more complicated such as SR-IOV etc.? Are you using the right parent adapter and matching subnetting in the docker-compose.yaml? Have you configured promiscious mode at the hypervisor layer (eg. something similar to the following for ESXi https://kb.vmware.com/s/article/1004099)?

Finally, have you tried running a simple ubuntu container or something else on the Mellanox interface? Does that work? - if so please repeat it on the network bridge? This will help you diagnose where the problem lies..

I appreciate the above is just a list of questions but there are so many areas that could cause this failure - this is actually quite difficult to diagnose.

rhastie avatar Dec 18 '23 09:12 rhastie