IOMMU related issues
I have built v1.2.0 RC2 from source. After running
sudo build/gatekeeper
an error is shown :
cannot add vfio group to container, error 22 (invalid argument)
and there I'm unable to start Gatekeeper. While troubleshooting, the error message below is found :
sudo dmesg | grep -i vfio
Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
From the preliminary study, I think the issue is more related to hardware / BIOS. However, I can't find an exact solution to actually solve this. I'm trying my luck here to see if anyone with deeper understanding of Intel VT-d, IOMMU and vfio-pci can assist to provide any idea.
The same error occured on both GT and GK. Below is the specification of the testbed : Bare-metal deployment, isolated lab environment. GK : OS : Ubuntu 24.04 LTS Server : HPE ProLiant RAM : 256GB CPU : Intel Xeon E5-2665 2.4GHz, 32 cores NUMA : 2 NUMA nodes NIC : Intel I350 1G, both front and back (Confirmed that DPDK is supported). The server also has Intel 82599ES 10G interface that supports DPDK, but we neither have a 10G uplink router available at the moment, so we didn't use it for the testbed.
GT : OS : Ubuntu 24.04 LTS Server : HPE ProLiant RAM : 256GB CPU : Intel Xeon E5-2640 2.6GHz, 32 cores NUMA : 2 NUMA nodes NIC : Intel I350 1G, front
Solutions tried on GT (which didn't work):
Adding vfio_iommu_type1.allow_unsafe_interrupts=1 In GRUB_CMDLINE_LINUX_DEFAULT
Current thoughts :
- In https://www.kernel.org/doc/Documentation/devicetree/bindings/iommu/iommu.txt , it is mentioned that "The device tree node of the IOMMU device's parent bus must contain a valid "dma-ranges" property that describes how the physical address space of the IOMMU maps to memory. An empty "dma-ranges" property means that there is a 1:1 mapping from IOMMU to memory.". Is there a way to verify the "dma-ranges" property? If yes, at least I can know what is causing it a non 1:1 mapping, and probably being able to trace down the root cause from here.
- The "Contact your platform vendor" mentioned in the error message raised my suspicion of BIOS incompatibility. If these machines can't do the job, machines of which specification / brand can?
Some links that are probably relevant but I can't fully understand the content due to lacking of relevant experience : https://github.com/kiler129/relax-intel-rmrr/blob/master/deep-dive.md#what-vendors-did-wrong https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com/ https://lore.kernel.org/linux-iommu/BN9PR11MB52768ACA721898D5C43CBE9B8C27A@BN9PR11MB5276.namprd11.prod.outlook.com/t/ https://lore.kernel.org/linux-iommu/[email protected]/ https://community.hpe.com/t5/proliant-servers-ml-dl-sl/proliant-dl360-gen9-getting-error-quot-rejecting-configuring-the/td-p/7220298 https://www.reddit.com/r/VFIO/comments/1gi95zf/rejecting_configuring_the_device_without_a_11/?rdt=37975 https://forum.proxmox.com/threads/qemu-exited-with-code-1-pcie-passthrough-not-working.146297/
The Intel I350 NIC has four ports, doesn't it? Is it built into the mainboard or a discrete NIC? If it's discrete, does your server have an onboard NIC? Have you configured any of the ports to be used by the kernel?
The Intel I350 NIC has four ports, doesn't it?
The I350 on GK has 2 ports, while the one on GT has 4.
Is it built into the mainboard or a discrete NIC? If it's discrete, does your server have an onboard NIC?
They're discrete, the onboard NICs on both machines are Broadcom NetXtreme which don't support DPDK.
Have you configured any of the ports to be used by the kernel?
I'm not quite sure about this, but here's all I have done to the ports :
-
Prior to setting up Gatekeeper, I first ensure the network connectivity by normally assigning IP address and default gateway using netplan, just like setting up any Ubuntu server. Then I bring them up and ping between devices, including routers.
-
Referring to Tips for Deployment, I changed their name to "front" and "back" respectively using systemd.link.
-
Entered command
sudo dependencies/dpdk/usertools/dpdk-devbind.py --bind=vfio-pci frontand verified that the bind is successful usingsudo dependencies/usertools/dpdk-devbind.py --status. The same goes to the back port.
I'm not sure whether it is necessary to bind all the ports on the same NIC to vfio-pci. However, this is already true for the GK server which has only 2 ports on the NIC.
While troubleshooting on GT server, I have also checked /sys/kernel/iommu_groups/, confirmed that the PCI address of the front port is under a group, and it's the only address in that group.
lspci -nnk -s 81:00.0 shows that the kernel-driver is vfio-pci, and the kernel modules is igb. I'm not sure if this is the expected result.
The machine is rebooted whenever necessary, and I actually did it a few more times as the old saying goes "If something is not working always try rebooting it".
Btw, after ensuring network connectivity to check they are working fine, I have already removed their netplan configuration and changed the state to DOWN before proceed to modifying them to be used by DPDK .
I encountered a similar error while testing a NIC some time ago. I solved the problem then by binding all ports of the NIC with vfio-pci, as you did by calling the script dpdk-devbind.py. Since you already did that for your Gatekeeper server, and it still doesn't work, your problem is different from mine.
Hardware issues are time-consuming to address. I recommend getting another NIC model and moving forward.
Section NIC of our wiki page "Hardware Requirements" lists NICs that Gatekeeper deployers have been using in production.
Thanks for the recommendation, I will see if I can find the same model of NICs. Meanwhile, can you advise some of the the brand, model and BIOS version of the bare metal servers that Gatekeeper have been successfully deployed on? While this error is not neccessarily caused by the BIOS, some proven effective examples will be a useful piece of information while trying to resolve it.
All shared notes on hardware are centralized on the wiki page Hardware Requirements. That said, my personal experience is with Dell servers.
Update : GT server model used : HP DL380 Gen9 GK server model used : HP DL380p Gen8
After a bunch of study + trial & error, I have managed to bring GT server up. It is related to the kernel. The problem is fixed by using System Configuration utility on Gen9+ servers to disable "HP Shared Memory features", more details at https://github.com/kiler129/relax-intel-rmrr/blob/master/deep-dive.md
On the other hand, since the GK server is a Gen8, the same solution doesn't apply for this machine. It seems like I would have to patch the kernel manually using https://github.com/kiler129/relax-intel-rmrr . Alternatively, this looks promising as well. However, both solutions seems time-consuming and troublesome.
The good news is that I350 is not the culprit.
Do you ever have to deal with these kinds of workaround when dealing with Dell servers? If Dell servers don't make this kind of problem, it will be my primary option and I would want to advise anyone trying to deploy Gatekeeper afterwards to avoid using old HP servers.
Another link with good reference value : https://forum.proxmox.com/threads/qemu-exited-with-code-1-pcie-passthrough-not-working.146297/
Based on my experience, I recommend Dell. Nevertheless, I don't want to suggest Dell servers are trouble-free; see issue #703 for an example. My recommendation is based on the fact that we've been able to overcome the problems with a clean solution.