xcp icon indicating copy to clipboard operation
xcp copied to clipboard

Mellanox SR-IOV broken on 8.2.1

Open Oleszkiewicz opened this issue 3 years ago • 27 comments

On 8.2.1 the SR-IOV is broken with Mellanox cards,

creating an SR-IOV network results in this errors in xensource log:

https://gist.github.com/Oleszkiewicz/ef77405840f928e81ddbcdd7ccf302fe

the network is created by is unusable.

when trying to start a VM with this network I get the:

Mar 26 07:27:01 yggdrasil xapi: [error||1031 ||backtrace] Async.VM.start R:d0d65e426987 failed with exception Server_error(NETWORK_SRIOV_INSUFFICIENT_CAPACITY, [ OpaqueRef:             cb14d368-7a22-438e-bffc-433a5bc7b3cf ])
Mar 26 07:27:01 yggdrasil xapi: [error||1031 ||backtrace] Raised Server_error(NETWORK_SRIOV_INSUFFICIENT_CAPACITY, [ OpaqueRef:cb14d368-7a22-438e-bffc-433a5bc7b3cf ])

error, regardless of 15 VF capacity available...

Oleszkiewicz avatar Mar 26 '22 06:03 Oleszkiewicz

This happens with both inbox drivers and Mellanox OFED drivers, so I guess the problem is in the xapi side...

Oleszkiewicz avatar Mar 26 '22 06:03 Oleszkiewicz

Can you reproduce on a fresh 8.2.0 install without any updates?

olivierlambert avatar Mar 26 '22 11:03 olivierlambert

Problem "kindof" solved, at least with 8.2.0 , however I will check this with 8.2.1. too,

I did some daemon stracing, and I have found that it first reads from /sys/class/net/eth0/device/sriov_totalvfs - this presents the max number of VFS configured in the EEPROM of the card (configurable with the tool from manufacturer)

the next thing is it writes the same value to /sys/class/net/eth0/device/sriov_numvfs

This is where the magic starts, this virtual file presents the value configured on the kernel module start, that shows the number of activated virtual functions, it is "rw" however writing to it results in a "No such file or directory" error UNLESS we write the value that is already there..

So a workaround is to enable the exact MAX number of vfs in the kernel module configuration - then the xcp-networkd will go through and allow creating SR-IOV network without an error. In any other case it fails.

I believe a little bit better error handling should be in place here, at least mentioning what I have found out

(if total_vfs !=num_vfs the error message could be something like "activate {total_vfs} virtual functions in the NIC driver") the current "No such file or directory" is kindof misleading even though it is forwarded from the driver/sysfs actually...

Best Piotr

From: Piotr Oleszkiewicz Sent: 26 March 2022 14:43 To: xcp-ng/xcp @.>; xcp-ng/xcp @.> Cc: Author @.***> Subject: RE: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

This is my next thing to check

Sent from my Galaxy

-------- Original message -------- From: Olivier Lambert @.@.>> Date: 3/26/22 12:45 (GMT+01:00) To: xcp-ng/xcp @.@.>> Cc: Piotr Oleszkiewicz @.@.>>, Author @.@.>> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can you reproduce on a fresh 8.2.0 install without any updates?

Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1079668905, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVD777EZ6CHKL25K3P3VB32D3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.@.>>

Oleszkiewicz avatar Mar 30 '22 16:03 Oleszkiewicz

That's… interesting indeed. We should at least get that feedback to XAPI devs. Thoughts @stormi ?

olivierlambert avatar Mar 30 '22 19:03 olivierlambert

So as I understand it it's not a regression from the 8.2 to 8.2.1 update. I think it would be good indeed to write a detailed bug report at https://github.com/xapi-project/xen-api/issues

stormi avatar Apr 04 '22 09:04 stormi

I am experiencing similiar behaviour and hope you do not consider this as issue-hijacking: I am also trying to get a ConnectX-3 40 Gbit Dual Port NIC working as a SR-IOV enabled NIC and fail with

Apr 15 23:01:52 hypervisor01 xcp-networkd: [ warn||153 |Async.network_sriov.create R:8b88ed3ead97|network_server] Failed to enable SR-IOV on eth2 with error: Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2
Apr 15 23:01:52 hypervisor01 xapi: [error||1583 ||backtrace] Async.network_sriov.create R:8b88ed3ead97 failed with exception Server_error(NETWORK_SRIOV_ENABLE_FAILED, [ OpaqueRef:05bf6088-0646-4f1d-bac9-9a29f24d4263; Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2 ])

The full log entry regarding the creation of a SR-IOV network on eth2 can be found here https://gist.github.com/Alphaprot/c327aaca1f10342adb32ad8872ebfc35

@Oleszkiewicz What params did you set inside the /etc/modprobe.d/mlx4_core.conf ? Or are you using a different driver? I found a Mellanox KB article about Configuring SR-IOV for ConnectX-3 on KVM , from which I "borrowed" the driver VFs-parameters. It sadly does not work due to aforementioned errors.

EDIT: This error is quite funny because there "is" a file like the one xcp-networkd is looking for:

# cat /sys/class/net/eth2/device/sriov_totalvfs returns 8

# cat /sys/class/net/eth2/device/sriov_numvfs returns 0 (which makes me assume that SR-IOV is not configured properly on the driver/kernel config)

Alphaprot avatar Apr 15 '22 21:04 Alphaprot

The driver settings that will work depend heavily on how you have your card configured with mellanox tools, and the correct configuration depends heavily on your use case. Basically you need to set max virtual functions to the same value in the card config and the driver, the behaviour is the same in inbox drivers and OFED drivers. Then you should decide whether you pass single port or dual port virtual functions to the vm. Hint: single port does not confuse xcp-ng, while dual port adds both ports to vm, while in xcp-ng you pass just one of the ports. Hint2: infiniband is not properly supported, would require a few days work to make xcp-ng understand how to properly configure virtual function while starting the vm (now it tries to set Eth MAC on IB interface). If you configure both ports to ETH however this is not an issue. I have successed in configuring my test cluster with SR-IOV and if you need further help - contact me directly and I'll assist you on this.

Sent from my Galaxy

-------- Original message -------- From: Yannik Zausig @.> Date: 4/15/22 23:31 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Author @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can confirm this behaviour, I am also trying to get a ConnectX-3 40 Gbit Dual Port NIC working as a SR-IOV enabled NIC and fail with

Apr 15 23:01:52 hypervisor01 xapi: [error||1583 ||backtrace] Async.network_sriov.create R:8b88ed3ead97 failed with exception Server_error(NETWORK_SRIOV_ENABLE_FAILED, [ OpaqueRef:05bf6088-0646-4f1d-bac9-9a29f24d4263; Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2 ])```

The full log entry regarding the creation of a SR-IOV network on eth2 can be found here

@Oleszkiewicz What params did you set inside the /etc/modprobe.d/mlx4_core.conf ? Or are you using a different driver? I found a Mellanox KB article about Configuring SR-IOV for ConnectX-3 on KVM , from which I "borrowed" the driver VFs-parameters. It sadly does not work due to aforementioned errors.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1100422548, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVFZ6HFSW4KGZEOHTKLVFHNZ3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.***>

Oleszkiewicz avatar Apr 16 '22 00:04 Oleszkiewicz

Now if I just knew how to contact you outside this issue, since both our github profiles appear to have pretty restricting privacy settings.

Regarding the problems I am experiencing (I have really no experience with SR-IOV and am not a dev myself) I see no VFs, only the physical card after reboot.

# lscpi | grep Mellanox
08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

In the BIOS (DELL R320 calls this "SR-IOV Global Enable") as well as on the ConnectX-3 firmware, SR-IOV are enabled and Intel VT-d is activated, too. I've added the intel_iommu=on kernel boot paramenter in /boot/efi/EFI/xenserver/grub.cfg and checked that it persists boot.

The output of the card firmware configuration is

# mlxconfig query

Device #1:
----------

Device type:    ConnectX3
Device:         /dev/mst/mt4099_pciconf0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)
         NUM_OF_VFS                          8
         LOG_BAR_SIZE                        3
         BOOT_OPTION_ROM_EN_P1               True(1)
         BOOT_VLAN_EN_P1                     False(0)
         BOOT_RETRY_CNT_P1                   0
         LEGACY_BOOT_PROTOCOL_P1             None(0)
         BOOT_VLAN_P1                        1
         BOOT_OPTION_ROM_EN_P2               True(1)
         BOOT_VLAN_EN_P2                     False(0)
         BOOT_RETRY_CNT_P2                   0
         LEGACY_BOOT_PROTOCOL_P2             None(0)
         BOOT_VLAN_P2                        1

My /etc/modprobe.d/mlx4_core.conf file looks like this: options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0

I then reloaded the core module and its companion-modules mlx4_ib and mlx4_en:

modprobe -r mlx4_ib mlx4_en
modprobe -r mlx4_core
modprobe mlx4_core mlx4_ib mlx4_en

Now all eight probed VFs show up as devices.

# lspci | grep Mellanox
08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
08:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

They can also be found as network interfaces

# ip link show
31: side-5519-eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:0d:9a:0d:b9:a0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
32: side-47-eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:0d:9a:0d:b9:a1 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

However, this kernel module options do not survive a reboot (wild guess, do the kernel drivers for e.g. networking get loaded at boot time without accessing the configuration options in /etc/modprobe.d/) and while the file remains there/intact, I have to reapply it each time by removing the mlx4-driver components and re-adding them. I have far too less knowledge about this, but I am trying my best. After inspecting the initrd, my assumption might hold true, as I do not see any reference to a config in /etc/modprobe.d for mlx4-related drivers, while other modules reference there:

# lsinitrd /boot/initrd-4.19-xen.img |grep mlx
drwxr-xr-x   2 root     root            0 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4
-rwxr--r--   1 root     root       667528 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
-rwxr--r--   1 root     root       260536 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko

And here I am out of luck (I used to create the SR-IOV networks via XCP-ng Center, which always fails , but am currently forced to use the CLI). There is a plethorra (58?!) of VF/side interfaces available with xe pif-list and in XCP-ng Center only 2 of them show as SR-IOV Capable. Note that I could not reboot (I rescanned the PIFs using xe pif-scan host-uuid=<my-host-id>) because I would lose the driver settings again (as mentioned earlier).

I am kindly asking for your assistance here. EDIT: And I always forget about pre-viewing my github entry, sorry for messing the initial version up with forward ticks instead of back-ticks in one code block. :(

Alphaprot avatar Apr 16 '22 11:04 Alphaprot

Send me some contact details I'll contact you or find me on fb :) I have a sailing yacht in my background pic.

As for no vfs on reboot - you need to recreate the initrd/intramfs (use dracut) so the driver is initialized properly on boot. Then do not probe vfs unless you need them on dom0. You don't need to probe them if you just want a pass-through.

Sent from my Galaxy

-------- Original message -------- From: Yannik Zausig @.> Date: 4/16/22 13:32 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Now if I just knew how to contact you outside this issue, since both our github profiles appear to have pretty restricting privacy settings.

Regarding the problems I am experiencing (I have really no experience with SR-IOV and am not a dev myself) I see no VFs, only the physical card after reboot.

lscpi | grep Mellanox

08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

In the BIOS (DELL R320 calls this "SR-IOV Global Enable") as well as on the ConnectX-3 firmware, SR-IOV are enabled and Intel VT-d is activated, too. I've added the intel_iommu=on kernel boot paramenter in /boot/efi/EFI/xenserver/grub.cfg and checked that it persists boot.

The output of the card firmware configuration is

mlxconfig query

Device #1:


Device type: ConnectX3

Device: /dev/mst/mt4099_pciconf0

Configurations: Next Boot

     SRIOV_EN                            True(1)

     NUM_OF_VFS                          8

     LOG_BAR_SIZE                        3

     BOOT_OPTION_ROM_EN_P1               True(1)

     BOOT_VLAN_EN_P1                     False(0)

     BOOT_RETRY_CNT_P1                   0

     LEGACY_BOOT_PROTOCOL_P1             None(0)

     BOOT_VLAN_P1                        1

     BOOT_OPTION_ROM_EN_P2               True(1)

     BOOT_VLAN_EN_P2                     False(0)

     BOOT_RETRY_CNT_P2                   0

     LEGACY_BOOT_PROTOCOL_P2             None(0)

     BOOT_VLAN_P2                        1

My /etc/modprobe.d/mlx4_core.conf file looks like this: options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0

I then reloaded the core module and its companion-modules mlx4_ib and mlx4_en:

modprobe -r mlx4_ib mlx4_en

modprobe -r mlx4_core

modprobe mlx4_core mlx4_ib mlx4_en

Now all eight probed VFs show up as devices. ´´´

lspci | grep Mellanox

08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 08:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

They can also be found as network interfaces

ip link show

31: side-5519-eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ec:0d:9a:0d:b9:a0 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto 32: side-47-eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ec:0d:9a:0d:b9:a1 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

However, this kernel module options do not survive a reboot (wild guess, do the kernel drivers for e.g. networking get loaded at boot time without accessing the configuration options in /etc/modprobe.d/) and while the file remains there/intact, I have to reapply it each time by removing the mlx4-driver components and re-adding them.

I have far too less knowledge about this, but I am trying my best. After inspecting the initrd, my assumption might hold true, as I do not see any reference to a config in /etc/modprobe.d for mlx4-related drivers, while other modules reference there:

lsinitrd /boot/initrd-4.19-xen.img |grep mlx

drwxr-xr-x 2 root root 0 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4 -rwxr--r-- 1 root root 667528 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko -rwxr--r-- 1 root root 260536 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko

And here I am out of luck (I used to create the SR-IOV networks via XCP-ng Center, which always fails , but am currently forced to use the CLI). There is a plethorra (58?!) of VF/side interfaces available with xe pif-list and in XCP-ng Center only 2 of them show as SR-IOV Capable.

Note that I could not reboot (I rescanned the PIFs using xe pif-scan host-uuid=<my-host-id>) because I would lose the driver settings again (as mentioned earlier).

I am kindly asking for your assistance here.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1100644932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVDPOGLAKD7D6L6Y4NTVFKQKTANCNFSM5RWDWDIA. You are receiving this because you were mentioned.Message ID: @.***>

Oleszkiewicz avatar Apr 16 '22 11:04 Oleszkiewicz

Long time no reply from my side - I had little time for my small lab and was struggling to get it working by myself. I was able to remove the 68 VIFs that should not be there by simply disabling the PCIe-Slot of my NIC and purging/forgetting the networks and associated PIFs and then re-scanning them.

I was able to pass them through, but FreeBSD's mxl4 drivers fail to establish a com channel to the PIF thus triggering the VIF's and PIF's reset/recovery loop over and over.

Just hit me up with a quick/empty reply to advance.07woofers(at)icloud.com so that I can describe what I did a bit more detailed.

Alphaprot avatar May 01 '22 08:05 Alphaprot

This post probably isn't of use to users of ConnectX-3 Pro since the OFED version I mention apparently doesn't support it. It did though occur to me: I didn't see any mention of Mellanox firmware version earlier in this thread. Maybe what firmware ships in the LTS 4.9-5.1.0.0 package could be installed on the host from another OS, and has some positive effect, but possibly such options have already been considered. ¯\(ツ)

I recently had a positive Mellanox (MT27800 Family [ConnectX-5]) SR-IOV experience involving XCP-ng 8.2.1 & FreeBSD (13.1) based guests (OPNsense 22.7) so I thought I'd share that here.

My reasoning to looking at SR-IOV at all, was due to experience of Xen / XCP-ng private networking exhibiting poor throughput when using FreeBSD based guest vm's. In case the back story is useful see: https://xcp-ng.org/forum/topic/5668/what-are-realistic-experienced-throughput-outcomes-for-internal-networks/42?_=1660177147391 (I will also update that post at some point)

As presented to me these network cards were configured with 16 VFs. Using SR-IOV network interfaces with the OPNsense guest vm resulted in expected network throughput across 2 interfaces on that guest vm; in the case of these cards approx ~22Gbit/s. The issue faced here was a lack of remaining VFs after the OPNsense was configured with the desired network interfaces / topology.

On my first pass using the Nvidia/Mellanox download site I was unsuccessful getting the installer package to do anything useful. After a diversion via Ubuntu eventually I've ended up with:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/ (There doesn't seem to be a LTS version available for xenserver) MLNX_OFED_LINUX-5.7-1.0.2.0-xenserver8.2-x86_64.tgz

installed using:

./mlnxofedinstall -vvv --without-32bit --distro xenserver --force --skip-distro-check --without-depcheck --without-fw-update

Actually I've been successful without the --without-fw-update flag, meaning that I get latest firmware installed and the above mentioned throughput across SR-IOV networks connected to the OPNsense, but I've left it in above in case of rapid-copy-pasters

I don't have entries under /etc/modprobe.d/

One other point I noted on my servers under test: There is an option to configure the number of VF's in the UEFI/BIOS. If I change that from the value set by default in the firmware or via eg mlxconfig -d /dev/mst/mt4119_pciconf0 s NUM_OF_VFS=64 then I also have the situation where I'm able to configure networks but they do not pass traffic.

enidice avatar Aug 31 '22 21:08 enidice

This is my next thing to check

Sent from my Galaxy

-------- Original message -------- From: Olivier Lambert @.> Date: 3/26/22 12:45 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Author @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can you reproduce on a fresh 8.2.0 install without any updates?

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1079668905, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVD777EZ6CHKL25K3P3VB32D3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.***>

Oleszkiewicz avatar Oct 11 '22 07:10 Oleszkiewicz