xcp icon indicating copy to clipboard operation
xcp copied to clipboard

Failed installation on top of previous soft RAID Linux install

Open AtaxyaNetwork opened this issue 3 years ago • 4 comments

Hello !

I regularly reinstall machine which previously run Linux (Debian 10 mostly) with soft raid 1 to XCP-ng. Since 8.2 (And I think it's older than that), when I recreate the soft raid 1 from the installer, the installer finishes correctly, but I end up in grub rescue at the reboot. My guess is that XCP-ng installer don't delete the old soft raid correctly, and the grub get confused. I try to boot via grub rescue, but with no success. My workaround is to boot a live Debian, launch the shell, and execute this for each disk I want to use in my soft raid:

DISK=sdx
LBAS=$(cat /sys/block/$DISK/size)
dd if=/dev/zero of=/dev/$DISK bs=512 count=1024
dd if=/dev/zero of=/dev/$DISK bs=512 seek=$(($LBAS-1024)) count=1024
mdadm --zero-superblock /dev/$DISK
sync

Then I can relaunch the installer, and XCP-ng install successfully !

Let me know if I can help !

Cécile

AtaxyaNetwork avatar Mar 23 '22 11:03 AtaxyaNetwork

Thanks for the report. It was known that creating a soft RAID may fail on previously used disks due to stale metadata, but not that it may succeed and then fail only at grub install stage.

Do you see what the error is in the installer logs (/tmp/install-log from the installer before rebooting, or /var/log/installer/install-log from the installed system that doesn't boot)?

stormi avatar Mar 23 '22 12:03 stormi

Related to https://github.com/xcp-ng/xcp/issues/107

stormi avatar Mar 23 '22 12:03 stormi

Hello !

Unfortunately, I didn't keep the logs, since I need the machine urgently. I will try to set up a test machine to reproduce this bug ASAP :)

AtaxyaNetwork avatar Mar 24 '22 08:03 AtaxyaNetwork

Hello !

I found the time to test the installation of XCP-ng on top of a Debian (11.3) soft raid 1 I tried the process on one of my lab machine (Dell R610 with 2 146G HDD) and a VM with two 80G disk. I have the same behavior on both machines. I attach the log of the VM one. installer.log

I did this to test raid soft:

  • Install a Debian soft raid1, with 1 / ext4 partition
  • Boot the Debian and make sure grub and raid work well
  • Then install XCP-ng (8.2.1)
  • On the disk selector, i have this: xvda xvdb md0 I tried to recreate the raid with the software raid panel, but the md0 raid was again present and i don't have a md127 as usual. So I selected md0, and continue my install (nothing special, I just select ext instead of lvm) As expected, once the installation is finish, when the server reboot, I arrive at the grub rescue.

I think the best workaround is to allow on the installer to delete old soft raid, using the command I provided in my first message.

I can provide you access to my lab machine and/or the VM I use to test, if you want to dig directly.

Thanks again for looking into that, and sorry for the delay !

AtaxyaNetwork avatar Apr 10 '22 22:04 AtaxyaNetwork