incus icon indicating copy to clipboard operation
incus copied to clipboard

Configurable Infiniband port_guid/node_guid

Open stgraber opened this issue 1 year ago • 12 comments

To make using Infiniband devices more consistent especially when SR-IOV is in the mix, we should be adding two new properties:

  • port_guid
  • node_guid

When set, Incus will then record the current values into volatile keys (like we do with MAC addresses), set the new values in place and then when the instance is stopped, it will restore those values.

stgraber avatar Jun 27 '24 18:06 stgraber

Forum thread where this was brought up: https://discuss.linuxcontainers.org/t/sriov-infiniband-device-persistent-or-at-least-consistent-port-guid-and-node-guid/20258/7

stgraber avatar Jun 27 '24 18:06 stgraber

Good afternoon, My friend @arojas2003 and I are taking Professor Chidambaram's Virtualization Class at UT Austin and were interested in tackling this issue for our class project. Would we be able to get this issue assigned to us?

tonyn10 avatar Apr 03 '25 19:04 tonyn10

Hi! I am Tony's partner

arojas2003 avatar Apr 03 '25 19:04 arojas2003

Done!

The bulk of the work on this one will happen within the internal/server/device/ package combined with matching documentation entry in doc/, unless #1787 lands first at which point the documentation will also all be in internal/server/device.

stgraber avatar Apr 03 '25 20:04 stgraber

Because you're unlikely to have access to Infiniband hardwre, I'll be handling testing on this one as I have a lab machine here with an Infiniband controller that I can pretty easily poke at.

But the initial boiler plate work to add the new properties shouldn't need hardware.

stgraber avatar Apr 03 '25 20:04 stgraber

Hi @stgraber, So we have taken a look at this issue, and have come up with a plan to tackle it: In infiniband_sriov.go, we plan to do the following: Step 1 -- In validateConfig(): * add port_guid and node_guid as optional fields * add rules for port_guid and node_guid similar to how it's done for hwaddr Step 2 -- In startContainer() (should I also do this in startVM()?) * take a snapshot of port_guid and node_guid as volatile variables in saveData, as is done for hwaddr and mtu * set the port_guid and node_guid from config as is done for the hwaddr (I'll create an ip link function in link.go for setting port_guid and node_guid, as seen in the forum linked above) Step 3 -- In postStop() * in the defer function, reset last_state.port_guid and last_state.node_guid to empty strings * restore port_guid and node_guid from the saved volatile variables as is done for hwaddr

Let us know what you think of our plan.

Also, just a few questions to go along with this:

  1. I'm assuming we only need to do this in infiniband_sriov.go, right? From what I gather in the forum, that's where the issue is occurring.
  2. How should we do validation for port_guid and node_guid? I'm wondering how similar it would be to the existing infinibandValidMAC function.
  3. Should I do step 2 in startVM() as well?
  4. Do we need to add all the gendoc comments in infiniband_sriov.go as well? If so, do we also need to restructure the documentation in devices_infiniband.md to include two tables (one for infiniband_sriov and one for infiniband_physical) as is done in the NIC Documentation?

arojas2003 avatar May 03 '25 22:05 arojas2003

So after further research, I think validation for port_guid and node_guid will be very similar to infinibandValidMAC, except we will only allow GUIDs to be 8 bytes long, instead of allowing both 8 or 20 byte values.

I also wanted to address Step 2 in our gameplan.

  • For setting guids, we will rely on the ip link function discussed earlier
  • For getting guids, I read some documentation that mentioned to use /sys/class/infiniband/<device name>/device/sriov/<vfID>/node and /sys/class/infiniband/<device name>/device/sriov/<vfID>/port to read node_guids and port_guids, respectively. However, this might relate to MLNX_OFED which I know you wanted to avoid. Is this the way to go, or would I have to use something else exposed through /sys/bus/pci/devices/ to access these guids?
  • When initially getting and setting guids in startContainer(), we need both the device name and vfID. I'm having trouble figuring out where we would get vfID from. I see that startVM() finds a free VF and stores it in the vfID variable, so I was wondering if we would instead have to do Step 2 in startVM() instead.

I would appreciate any guidance you may have.

arojas2003 avatar May 05 '25 03:05 arojas2003

Hey, sorry for the delay, I've been going through notifications mostly in the order they've arrived those past few days, but there's a lot :)

  1. GUID validation sounds good and yeah, nothing too fancy here, we mostly want to make sure it's the right length.

  2. I'm starting a system with some mlx4 NICs now to check what it's like outside of OFED. This feature has been around long enough that I expect to be there in the standard ml4_ib driver.

  3. I'm going to check on that one as far as exactly what we get in the resource API and what we have in sysfs to try to track down the VF number.

stgraber avatar May 08 '25 11:05 stgraber

Sorry, I got pretty distracted today... I did manage to confirm that the open source driver sadly doesn't provide those files, at least not with the IB card that I have around.

I'm going to try to get an OFED build that supports that card installed on this box and see if I get those control files then.

stgraber avatar May 09 '25 02:05 stgraber

Gah, really not having too much luck as my card is old and needs an older OFED but that then needs an older Ubuntu version too which doesn't want to install...

stgraber avatar May 09 '25 04:05 stgraber

I'd probably recommend opening a PR with the assumption that those files exist, making sure to only attempt to write to the files if we have the new property set. Then I can at least do regression testing on that until I find a way to run OFED.

Basically I can deploy 22.04 or 24.04 on the test system, neither are supported with the old OFED which supports my Infiniband card... So I either need to find a way to run Ubuntu 20.04 on there or I need to get a newer card.

stgraber avatar May 12 '25 02:05 stgraber

Hi @stgraber, Sounds good! I'll go ahead and leave my implementation of retrieving guids to use those sysfs files (/sys/class/infiniband/<device name>/device/sriov/<vfID>/node and /sys/class/infiniband/<device name>/device/sriov/<vfID>/port).

Also, do you have any new insights into how to track down the VF Number in startContainer? The VF Number is necessary to both set and get guids. Also, let me know if startContainer is indeed the right place where we should be taking the snapshots of the guids.

arojas2003 avatar May 12 '25 03:05 arojas2003