sd-zfs icon indicating copy to clipboard operation
sd-zfs copied to clipboard

systemd udev doesn't wait for HDDs, system can't boot (not even recovery)

Open alaricljs opened this issue 8 years ago • 23 comments

With systemd udev, sd-zfs fails to find my boot devices and attempts to go into rescue mode and fails there with an inability to mount /sysroot and systemd whining about not being able to do something with the password file. With udev and zfs hooks udev spins for a while waiting on my storage and then everything comes up clean. With systemd and sd-zfs the udev startup appears to get paralleled with other items and this breaks the process. The boot devices are SSDs, however primary storage is all HDDs set to not spin up unless told to. Unfortunately I don't have the resources to determine if this will happen without sd-zfs.

Do you know of some way to force systemd to wait on udev before proceeding? The initramfs is a very truncated version and I don't know where to start looking.

alaricljs avatar Jan 27 '17 15:01 alaricljs

That should not happen, as sd-zfs goes After udev:

https://github.com/dasJ/sd-zfs/blob/master/src/zfs-generator.c#L173

You can check the dependency tree by putting systemd-analyze to your initrd and generating a depenency graph.

dasJ avatar Feb 21 '17 11:02 dasJ

Any thoughts on how to get that to work and get data out of it? zfs pool never imports or mounts, no login access is available. I don't see a way to run this against a non-running systemd setup.

alaricljs avatar Feb 23 '17 01:02 alaricljs

Have you tried both emergency and recovery target and did you prepend rd. to the kernel option?

On 23 Feb 2017, 02:37, at 02:37, alaricljs [email protected] wrote:

Any thoughts on how to get that to work and get data out of it? zfs pool never imports or mounts, no login access is available. I don't see a way to run this against a non-running systemd setup.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dasJ/sd-zfs/issues/20#issuecomment-281863174

dasJ avatar Feb 23 '17 16:02 dasJ

I appear to be having the same issue only it's also not waiting for cryptsetup.target.

wallzero avatar Apr 25 '17 04:04 wallzero

Do you use hibernation? You may have the same problem I had: https://github.com/systemd/systemd/issues/4577

dasJ avatar Apr 28 '17 12:04 dasJ

No I don't use hibernation at the moment; I only have one SSD and I didn't want to partition it. My swap sits in a VDEV and won't work with hibernation as far as I know.

wallzero avatar May 01 '17 15:05 wallzero

I just tried again on a fresh install and sd-zfs still seems to run immediately. Before I am even prompted for the luks password a [FAILED] Failed to mount /sysroot error is logged. systemctl status sysroot.mount shares the following:

Where: /sysroot
What: zfs:rpool/root
Docs: ...
Process: 116 ExecMount=/usr/bin/mount zfs:rpool/root /sysroot -o rw ...

wallzero avatar Dec 13 '17 08:12 wallzero

How did you configure LUKS?

dasJ avatar Dec 13 '17 19:12 dasJ

I encrypted the drive with the following:

cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha512 /dev/sda2

Then I modified /etc/default/grub and tried several things:

GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.uuid=id rd.luks.name=id=luks rd.luks.crypttab=no rd.luks.options=tries=0,timeout=120s rootflags=x-systemd.mount-timeout=infinity,retry=10000,x-systemd.device-timeout=120s root=zfs:rpool/root zfs_force=1 quiet splash"

/etc/fstab doesn't have anything about the root, only the following:

# EFI
UUID=efiId /boot/efi vfat discard,umask=0077 0 0

# Boot
UUID=bootId /boot ext4 defaults,discard,nofail 0 0

# Swap
/dev/zvol/rpool/swap none swap defaults,discard 0 0

The zpool bootfs is configured:

zpool get bootfs
NAME   PROPERTY  VALUE       SOURCE
rpool  bootfs    rpool/root  local

wallzero avatar Jan 03 '18 20:01 wallzero

Correct ..... almost.

You need to add sd-encrypt to the HOOKS in /etc/mkinitcpio.conf. The cmdline LUKS options are only for the non-systemd initrd. My cmdline just looks like: root=zfs:zroot/root rw.

Now create a /etc/crypttab.initramfs. You can find the syntax in crypttab(5). Yours should probably look like this:

luks     UUID=id - tries=0,timeout=120s

Also, you can add discard to the options in the crypttab entry when you have an SSD.

Then, rebuild your initcpio. Hope this helps.

dasJ avatar Jan 04 '18 17:01 dasJ

Btw, my full HOOKS are:

HOOKS="base systemd autodetect modconf block keyboard sd-vconsole sd-encrypt sd-zfs"

I don't really need base, but it helps when I have to troubleshoot stuff in the initrd.

dasJ avatar Jan 04 '18 17:01 dasJ

Sorry I had forgotten /etc/mkinitcpio.conf:

HOOKS="base systemd autodetect modconf block keyboard keymap sd-encrypt sd-zfs filesystems fsck"

The only difference I see is sd-vconsole is missing and filesystems and fsck are added.

wallzero avatar Jan 04 '18 19:01 wallzero

Do you have a proper crypttab.initramfs?

dasJ avatar Jan 04 '18 20:01 dasJ

I tried adding your crypttab.initramfs example with my UUID but after rebuilding the initcpio and updating grub it still appears to not wait for the password prompt. Same issue as above.

Also, why is it /etc/crypttab.initramfs and not /etc/crypttab?

wallzero avatar Jan 04 '18 20:01 wallzero

Because the crypttab.initramfs is put into your initramfs, while crypttab isn't. In your fstab, can you use /dev/mapper/whatever instead of the UUIDs? systemd is probably unable to generate the proper dependencies.

dasJ avatar Jan 04 '18 23:01 dasJ

I'm sorry, do you mean for the ZFS partition? I do not have the ZFS partition UUID under a / entry in /etc/fstab. I am using the zpool bootfs option. I didn't mention above that I also set the mountpoint on the root partition:

zfs get mountpoint rpool/root
NAME PROPERTY VALUE SOURCE
rpool/root mountpoint / local

I could try zfs set mountpoint legacy?

/dev/mapper/ only contains control and luks links. I could also try /dev/mapper/luks in /etc/crypttab.initramfs?

wallzero avatar Jan 05 '18 01:01 wallzero

I had the same problem. grub-mkconfig throws in additional root directives, see /etc/grub.d/10_linux@line 66 or so. My /etc/default/grub was set with root per the documentation, however the generated /boot/grub/grub.cfg had two root= lines, one with root: as required and one with root=ZFS= which is what systemd picked up and tried to run with. Booting up and removing the first entry from the kernel line let me boot without issues.

For the record, ZFS on LUKS encrypted full disk encryption, boot is encrypted on a separate drive and no keyfiles, manual entry of passwords until I get things debugged correctly.

dschaper avatar Jan 09 '18 06:01 dschaper

@Schlesiger Sorry, I was mistaken. Have you tried the hint of @dschaper ?

dasJ avatar Jan 09 '18 09:01 dasJ

@dschaper Thank you for your input! @dasJ I will give @dschaper solution a try! I already see two root= definitions in my /boot/grub/grub.cfg.

wallzero avatar Jan 24 '18 13:01 wallzero

I have the same issue with systemd-boot. I checked my kernel parameter in the entries screen. I don't have any duplicate root=/: values. IMG_20210623_201847

That's the error during bootstrap. I also can't get a emergency shell. And journalctl is empty when I chroot in.

IMG_20210623_201507

maksim-pinguin avatar Jun 23 '21 18:06 maksim-pinguin

I'm seeing the same behaviour — sd-zfs tries to import the pool before sd-encrypt has decrypted it — with this hook order:

HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block sd-encrypt lvm2 sd-zfs filesystems fsck)

I'll try different device specifications (currently PARTLABEL=…) in /etc/crypttab.initramfs when I have some more time.

@maksim-pinguin For what it's worth, you can create an unlocked root account in your initramfs (which is created separately from your regular root account) to at least get an emergency shell: https://bbs.archlinux.org/viewtopic.php?pid=1927757#p1927757 From there, you can probably just zpool import -R /sysroot rpool; exit to continue booting normally.

n-st avatar Sep 26 '21 22:09 n-st

I have the same error as @maksim-pinguin on my setup.

misaka18931 avatar Apr 27 '22 10:04 misaka18931

I managed to get it running with the old syntax for the kernel parameter regarding the zfs partition. Check this thread: https://bbs.archlinux.org/viewtopic.php?pid=1979863#p1979863

maksim-pinguin avatar Apr 28 '22 12:04 maksim-pinguin