Configurable Infiniband port_guid/node_guid
To make using Infiniband devices more consistent especially when SR-IOV is in the mix, we should be adding two new properties:
- port_guid
- node_guid
When set, Incus will then record the current values into volatile keys (like we do with MAC addresses), set the new values in place and then when the instance is stopped, it will restore those values.
Forum thread where this was brought up: https://discuss.linuxcontainers.org/t/sriov-infiniband-device-persistent-or-at-least-consistent-port-guid-and-node-guid/20258/7
Good afternoon, My friend @arojas2003 and I are taking Professor Chidambaram's Virtualization Class at UT Austin and were interested in tackling this issue for our class project. Would we be able to get this issue assigned to us?
Hi! I am Tony's partner
Done!
The bulk of the work on this one will happen within the internal/server/device/ package combined with matching documentation entry in doc/, unless #1787 lands first at which point the documentation will also all be in internal/server/device.
Because you're unlikely to have access to Infiniband hardwre, I'll be handling testing on this one as I have a lab machine here with an Infiniband controller that I can pretty easily poke at.
But the initial boiler plate work to add the new properties shouldn't need hardware.
Hi @stgraber,
So we have taken a look at this issue, and have come up with a plan to tackle it:
In infiniband_sriov.go, we plan to do the following:
Step 1 -- In validateConfig():
* add port_guid and node_guid as optional fields
* add rules for port_guid and node_guid similar to how it's done for hwaddr
Step 2 -- In startContainer() (should I also do this in startVM()?)
* take a snapshot of port_guid and node_guid as volatile variables in saveData, as is done for hwaddr and mtu
* set the port_guid and node_guid from config as is done for the hwaddr (I'll create an ip link function in link.go for setting port_guid and node_guid, as seen in the forum linked above)
Step 3 -- In postStop()
* in the defer function, reset last_state.port_guid and last_state.node_guid to empty strings
* restore port_guid and node_guid from the saved volatile variables as is done for hwaddr
Let us know what you think of our plan.
Also, just a few questions to go along with this:
- I'm assuming we only need to do this in
infiniband_sriov.go, right? From what I gather in the forum, that's where the issue is occurring. - How should we do validation for
port_guidandnode_guid? I'm wondering how similar it would be to the existinginfinibandValidMACfunction. - Should I do step 2 in
startVM()as well? - Do we need to add all the gendoc comments in
infiniband_sriov.goas well? If so, do we also need to restructure the documentation indevices_infiniband.mdto include two tables (one forinfiniband_sriovand one forinfiniband_physical) as is done in the NIC Documentation?
So after further research, I think validation for port_guid and node_guid will be very similar to infinibandValidMAC, except we will only allow GUIDs to be 8 bytes long, instead of allowing both 8 or 20 byte values.
I also wanted to address Step 2 in our gameplan.
- For setting guids, we will rely on the
ip linkfunction discussed earlier - For getting guids, I read some documentation that mentioned to use
/sys/class/infiniband/<device name>/device/sriov/<vfID>/nodeand/sys/class/infiniband/<device name>/device/sriov/<vfID>/portto readnode_guids andport_guids, respectively. However, this might relate toMLNX_OFEDwhich I know you wanted to avoid. Is this the way to go, or would I have to use something else exposed through/sys/bus/pci/devices/to access these guids? - When initially getting and setting guids in
startContainer(), we need both the device name and vfID. I'm having trouble figuring out where we would getvfIDfrom. I see thatstartVM()finds a free VF and stores it in thevfIDvariable, so I was wondering if we would instead have to do Step 2 instartVM()instead.
I would appreciate any guidance you may have.
Hey, sorry for the delay, I've been going through notifications mostly in the order they've arrived those past few days, but there's a lot :)
-
GUID validation sounds good and yeah, nothing too fancy here, we mostly want to make sure it's the right length.
-
I'm starting a system with some mlx4 NICs now to check what it's like outside of OFED. This feature has been around long enough that I expect to be there in the standard ml4_ib driver.
-
I'm going to check on that one as far as exactly what we get in the resource API and what we have in sysfs to try to track down the VF number.
Sorry, I got pretty distracted today... I did manage to confirm that the open source driver sadly doesn't provide those files, at least not with the IB card that I have around.
I'm going to try to get an OFED build that supports that card installed on this box and see if I get those control files then.
Gah, really not having too much luck as my card is old and needs an older OFED but that then needs an older Ubuntu version too which doesn't want to install...
I'd probably recommend opening a PR with the assumption that those files exist, making sure to only attempt to write to the files if we have the new property set. Then I can at least do regression testing on that until I find a way to run OFED.
Basically I can deploy 22.04 or 24.04 on the test system, neither are supported with the old OFED which supports my Infiniband card... So I either need to find a way to run Ubuntu 20.04 on there or I need to get a newer card.
Hi @stgraber,
Sounds good! I'll go ahead and leave my implementation of retrieving guids to use those sysfs files (/sys/class/infiniband/<device name>/device/sriov/<vfID>/node and /sys/class/infiniband/<device name>/device/sriov/<vfID>/port).
Also, do you have any new insights into how to track down the VF Number in startContainer? The VF Number is necessary to both set and get guids. Also, let me know if startContainer is indeed the right place where we should be taking the snapshots of the guids.