xcp icon indicating copy to clipboard operation
xcp copied to clipboard

After upgrade to 8.0: iSCSI connection fails after reboot

Open flipsa opened this issue 5 years ago • 11 comments

I just upgraded a test pool of 2 nodes running Xenserver-7.1 to xcp-ng-8.0, and while the upgrade did not produce any errors, after booting up the iSCSI connection is not working.

I do use CHAP on the iSCSI target, and until the upgrade all nodes were able to login to the target just fine. With xe pbd-list params=all uuid=xxx I can see that the target, initiator IQN, username and password show up correctly under "device config" section of the pbd on each node.

If I try to manually connect (no matter if from XO or the cli) I get the same error:

xe pbd-plug  uuid=xxxx
Error code: SR_BACKEND_FAILURE_47
Error parameters: , The SR is not available [opterr=ISCSI login failed - check access settings for the initiator on the storage, if CHAP is used verify CHAP credentials], 

After issuing a iscsiadm -m discovery -t sendtargets -p IP:PORT, a subsequent re-try with xe pbd-plug uuid=xxxx works.

However, I failed so far to make this stick. Every time I reboot the machine, the problem is back.

Thanks for any hints!

flipsa avatar Aug 23 '19 16:08 flipsa

Hmm I'm not aware about previous reported problem on this. You might have more luck on the forum for this (https://xcp-ng.org/forum)

olivierlambert avatar Aug 23 '19 18:08 olivierlambert

Hey @olivierlambert

I searched the forum but couldn't find any similar posts, so I tried to investigate further before posting there. Seems like various CHAP implementations (besides not adding much to security in general) differ from device to device, some require CHAP both when discovering and when authenticating, some (like the QNAP I'm using here) expect the discovery without CHAP but the authentication with CHAP - and none of that is properly documented anywhere (for the QNAP)...

I've tried everything I could think of over the last day, and wasn't successful in reparing / fixing this:

  • "repair" the iSCSI SR from xcp-ng Center
  • tried basically every combination of relevant settings on the xcp-ng side (/etc/iscsi/iscsid.conf)
  • changing settings on SAN (chap on/off; setting new password)
  • fresh installation of one of the pool servers

However, after countless hours I decided to "forget" and "re-attach" the LVMoISCSI SR - that worked (and without loosing any VMs / data on the SR). So far so good!

I found a few people with similar problems, so in case anybody runs into this, here some observations:

What I noticed - and I think this is different from when I set up everything on XS-7.1 about 2 years ago - is that the discovery (at least for my hardware here) only works without CHAP, while the authentication then requires CHAP. It's possible to do this on the CLI (see first post) but also in xcp-ng Center (no CHAP checkmark when discovering; then set the checkmark before scanning for LUNs). This way I could successfully connect to the iSCSI SAN, but after rebooting the same problem happened again...

Only after getting rid of CHAP on the SAN side completely and forgetting and re-attaching the SR again and have the connection created automatically after a reboot.

Since this is a definitive difference between xcp-ng-8.0 and XS-7.1, and since /var/log/SMlog showed some errors (line 512 in /opt/xensource/sm/LVHDoISCSISR.py) I diffed the file and found numerous differences between the 2 versions. I suspect that this might be ultimately repsonsible for the changed behaviour. Unfortunately I did not have the time to really try to understand (and possibly fix it), but I might try to look into it again when I get some free time.

In case anybody hits the same problem (I found a few posts with similar problems but no solution except forgetting/reattaching the SR) and wants to investigate this further, here are 2 files that might help:

I leave it up to you if you want to keep this issue open or close it. It does work for me now (after a rather scary "forget"/"reattach"), but IMHO this is something that broke after upgrading, so it might affect others as well.

flipsa avatar Aug 24 '19 13:08 flipsa

@flipsa thanks a lot for your extensive feedback!

Before closing it, we might update our Wiki in the upgrade process to point here for people having iSCSI issues.

olivierlambert avatar Aug 25 '19 09:08 olivierlambert

Hi @flipsa! Could you add a short section to https://github.com/xcp-ng/xcp/wiki/Troubleshooting#iscsi-troubleshooting that describes the issue and if necessary links to your comment here?

stormi avatar Nov 22 '19 15:11 stormi

Hey @stormi,

I'm super busy until mid next week, but will do it as soon as I find the time...

flipsa avatar Nov 25 '19 11:11 flipsa

Hi @flipsa. I'm still interested in an update of https://github.com/xcp-ng/xcp/wiki/Troubleshooting#iscsi-troubleshooting with your findings so that we can close this issue. Thanks!

stormi avatar Apr 20 '20 11:04 stormi

I'm experiencing the same problem with ISCSI failures after upgrading from 7.6 to 8.1

andersonvaz avatar May 31 '20 23:05 andersonvaz

Does this comment work for you?

stormi avatar Jun 02 '20 08:06 stormi

i have the same issue but found a different workaround.

If you try to add a NEW SR withthe wizard, if you search your unplugged SR with IP address it will locate the ISCSI SR. when found just cancel the process, go back and try to repair your unplugged storage, the repair will work

prilly-dev avatar Sep 12 '22 07:09 prilly-dev

In version 8.2 I had no more problems

andersonvaz avatar Sep 12 '22 12:09 andersonvaz

In version 8.2 I had no more problems

That is not the case here.

prilly-dev avatar Sep 12 '22 13:09 prilly-dev