xcp
xcp copied to clipboard
TrueNAS guest iSCSI connectivity issue
First of all, I would like to say that I've spent several days debugging the issue and trying many different things without figuring out what's the cause so if it is not with XCP-NG but with TrueNAS, I apologize in advance and will file the ticket in their issue tracker.
The issue I'm having is that any time that an initiator attempts to mount an iscsi target, TrueNAS VM loses all network connectivity. There are two ways to restore it: xe vif-unplug, xe vif-plug or guest reboot. There are no anomalies to report prior to mounting the iSCSI target: network works, speeds are good, iSCSI discovery works fine. But as soon as target is mounted, the network interface on the TrueNAS VM seems to have all the routes cleared for that interface. Here are the things I tried:
- Setting MTU to 1500 and 9000
- Setting various vif other-config parameters (my default is ethtool-tx: off as recommended in the docs, but I tried clearing the value and I tried with all the different other offloads off)
- Setting hardware offload to off in the TrueNAS VM
- Connecting using iSCSI initiator from an Ubuntu VM on the same internal network
- Connecting using Windows iSCSI initiator from the outside by routing traffic to the TrueNAS VM using a pfsense guest
- Creating a second vif in TrueNAS VM. In this case, only the interface on which iSCSI connection is made loses connectivity
- Mounting using NFS - works perfectly fine
I also tried to replicate the exact same setup of TrueNAS and an Ubuntu VM on internal network in VirtualBox, where everything works fine. This is what lead me to choose the hypervisor rather than TrueNAS as a project to report the issue to.
Finally, I found the following two threads on the net of users having the exact same issue but without any resolution:
https://forums.lawrencesystems.com/t/dropping-connection-xcp-ng-with-free-truenas-as-guest/6720/6 https://www.truenas.com/community/threads/network-loses-connectivity-when-iscsi-target-connected.87251/
There seems to be one walkaround which is to bridge the virtual network to a physical adaptor but ideally I would like to keep this traffic internal so that: 1. it stays private, 2. it is not limited by the physical port speed.
My XCP-NG is version 8.2 with all the updates applied as of the time of this writing. I can provide additional details as needed, just not sure what else might help here.
Hi! You might reach more people that could potentially help on the forum: https://xcp-ng.org/forum/. Maybe there already exist old threads about it? I haven't checked.
Hi, thank you for the suggestion. I can (and will) ask, but I'm also fairly certain it is a bug. I've scouted all the potential resources on the net for the past week including the XCP-NG forum. The only two threads on the subject I could find are the ones I listed in the original post and their problem description is exactly the same. So I would have really liked a dev to take a look at the problem if possible because at least for me it is 100% reproducible. What I do not know is if it's my hardware, Xen, XCP-NG or TrueNAS.
High-level steps to reproduce:
- Start off with a fresh XCP-NG 8.2, apply latest updates (this is for simplicity, but an existing XCP-NG will also do just fine)
- Create a new private network
- Create a new VM from TrueNAS 12.0-U7 ISO. Connect it only to the private network created in step 2. Need to make sure that the VM has at least two virtual disks, one for OS and another for storage.
- Once installed, from the VM console, configure a static IP on the TrueNAS VM
- Create a new VM from the latest Ubuntu Desktop ISO. Connect it to an external network for internet access and to the same private network from step 2.
- Once installed, set a static IP in the Ubuntu VM for the private network. Also install the package
open-iscsi
. - From the Ubuntu VM, use the web browser to connect to TrueNAS. Configure a ZFS volume on the secondary virtual disk.
- Still in TrueNAS UI, activate the iSCSI service and configure the iSCSI target exposing the created ZFS volume as a block device.
- Open a terminal in Ubuntu VM and try to discover the iSCSI target:
at this step it should show the targets from TrueNAS - so far so goodiscsiadm --mode discoverydb --type sendtargets --portal [TrueNAS IP]:3260 --discover
- Now try to connect to the target:
In dmesg, you see the connection is successful and new block device is created in /dev. But after 5 seconds, TrueNAS console saysiscsiadm --mode node --targetname <target IQN> --portal [TrueNAS IP]:3260 --login
no ping reply (NOP-Out) after 5 seconds: dropping connections
and the network on TrueNAS becomes inaccessible. Have to doxe vif-unplug
,xe vif-plug
on the TrueNAS vif to get the connectivity back. The exact same set of steps in VirtualBox work fine.
I'm not a hard-core expert in networking or in low-level OS stuff, but I do work in the industry and do my fair share of software/hardware troubleshooting. So I can perform technical steps to help debug this if necessary. I just do not know myself how to approach it. But if somebody more knowledgeable hints me where to start digging, I'm ready to do it to get to the bottom of this.
The community on the forum can help debugging the issue, which would raise the likeliness of a fix if there's a bug.
I came here since I have the exact same problem, while testing TrueNAS for future use. (I won't have it as a VM then, but during testing it is of course an easy route). I can add that the problem seems to be due to page crossing in xenvif_count_requests, in xen-netback/netback.c:
[318644.585621] vif vif-14-2 vif14.2: Cross page boundary, txp->offset: 0, size: 8900 [318644.585635] vif vif-14-2 vif14.2: fatal error; disabling device
Edit: the problem is in FreeBSD 13's netfront. It is fixed in 14, but TrueNAS core does not seem to go there. https://github.com/freebsd/freebsd-src/commit/dabb3db7a817f003af3f89c965ba369c67fc4910