heads icon indicating copy to clipboard operation
heads copied to clipboard

Boot device detection is not deterministic

Open nestire opened this issue 9 months ago • 5 comments

Please identify some basic details to help process the report

A. Provide Hardware Details

  1. What board are you using? (Choose from the list of boards here) novacustom-v560tu

B. Identify how the board was flashed

  1. How was Heads initially flashed?
    • [x] External flashing
    • [ ] Internal-only / 1vyprep+1vyrain / skulls
    • [ ] Don't know

C. Identify the rom related to this bug report

  1. Did you download or build the rom at issue in this bug report?

    • [x ] I downloaded it
    • [ ] I built it
  2. If you downloaded your rom, where did you get it from?

    • [x] Heads CircleCi
    • [ ] Purism
    • [ ] Nitrokey
    • [ ] Dasharo DTS (Novacustom)
    • [ ] Somewhere else (please identify)

    Please provide the release number or otherwise identify the rom downloaded

  3. If you built your rom, which repository:branch did you use?

    • [x] Heads:Master
    • [ ] Other (please identify)

Please describe the problem

Describe the bug If in you have 2 nvme installed in the v56 laptop: same size and vendor, and both nvme are valid boot devices, heads will be switching between these boot devices every reboot in a random way. This creates a lot of different faulty behaviours which are hard to diagnose.

Expected behavior Always choose the same boot device, warn at OEM-Factory-Reset that there are 2. valid Boot Devices and ask which one is the correct one.

Additional context my guess is that this fdisk call https://github.com/linuxboot/heads/blob/d4c4e5699b89365a88d9d49748dbcc11b6394907/initrd/etc/functions#L1142 is to blame not sure if this is also a problem for non nvme setups

nestire avatar Mar 03 '25 11:03 nestire

@nestire I guess the fix would be to mount /boot by partition uuid and not by /dev/ convenient naming scheme, which is the one being deterministic?

tlaurion avatar Mar 03 '25 16:03 tlaurion

@tlaurion the naming of the device is not changing, so at one boot it is /dev/nvme1 and on the next boot it is /dev/nvme0 . The problem here seems to be that the order fdisk list the devices in this moment seems to be not fix, so whatever device is at top will be choosen since the function stops with the first boot device. So a simple "sort" in the pipe should probably fix this.
But using the uuid is in general a good practice i guess.

nestire avatar Mar 05 '25 11:03 nestire

@nestire

@tlaurion the naming of the device is not changing, so at one boot it is /dev/nvme1 and on the next boot it is /dev/nvme0.

This doesn't make sense to me. fdisk -l just lists disks, where /dev/* convenient name scheme is populated by kernel as first seen -> assigned.

The problem here seems to be that the order fdisk list the devices in this moment seems to be not fix, so whatever device is at top will be choosen since the function stops with the first boot device. So a simple "sort" in the pipe should probably fix this.

fdisk -l already sorts convenient names alphanumerically, so kernel first discovered disk will be /dev/nvme0, second will be /dev/nvme1. The problem here is that across reboots, those convenient names change, and nvme0 <-> nvme1 is assigned to randomly found drives by kernel

But using the uuid is in general a good practice i guess.

Grub works like that nowadays, so that dev passed is per uuid, same for fstab. blkid does the mapping between covenient device name and uuid. Not sure how to properly refactor codebase to use uuid instead of convenient device names though.

https://github.com/linuxboot/heads/blob/d4c4e5699b89365a88d9d49748dbcc11b6394907/initrd/etc/functions#L1129-L1170

Somewhat, CONFIG_BOOT_DEV is not enough since it refers to something non-deterministic if multiple nvme present with each having a distinct boot device. Seems like we would need to change CONFIG_BOOT_DEV content to its UUID and make sure this doesn't make regression in all places its used.

https://github.com/linuxboot/heads/pull/903 proposed something similar, using labels instead of UUID, where https://github.com/linuxboot/heads/pull/903/commits/afba8f79ab815e1abfff26b258a37bd0126b9e8c being associated commit suggest UUID.

Thoughts @nestire ?

tlaurion avatar Mar 05 '25 17:03 tlaurion

Hi,

I think there is a misunderstanding. What happen was that the Boot device changed from /dev/nvme0 to /dev/nvme1 in between the boots, the content of /dev/nvme0 of /dev/nvme1 was always the same so the kernel always assigned the same hardware to these names. Because of that my guess was this 'fdisk -l' my understanding is also that this should be sorted, but it somehow did not always produce the same expected result "/dev/nvme0" but 50% of the time "/dev/nvme1" in the recovery shell always both devices where present. Other explanation could be that the /dev/nvme0 was not mountable/readable during the boot of heads do to a race condition, where then the mount of /dev/nvme1 succeed and because of that this was choosen instead /dev/nvme0

Unfortunately the device where we have seen this is gone now but I will try to reproduce this on another device, to be 100% sure.

nestire avatar Mar 10 '25 10:03 nestire

Hi,

I think there is a misunderstanding. What happen was that the Boot device changed from /dev/nvme0 to /dev/nvme1 in between the boots, the content of /dev/nvme0 of /dev/nvme1 was always the same so the kernel always assigned the same hardware to these names. Because of that my guess was this 'fdisk -l' my understanding is also that this should be sorted, but it somehow did not always produce the same expected result "/dev/nvme0" but 50% of the time "/dev/nvme1" in the recovery shell always both devices where present. Other explanation could be that the /dev/nvme0 was not mountable/readable during the boot of heads do to a race condition, where then the mount of /dev/nvme1 succeed and because of that this was choosen instead /dev/nvme0

Unfortunately the device where we have seen this is gone now but I will try to reproduce this on another device, to be 100% sure.

@nestire this will need replication. busybox fdisk will report /dev/nvme0 before /dev/nvme1, output is ordered alphanumerically.

If the order unstable between reboots, we would need to get away of friendly devices names in codebase and replace with UUID. This minimally needs to be replicated first and properly diagnosed, before I try to replicate this under QEMU and start working on a fix. If two nvme drives are provisioned by OEM or end user install another OS on a second nvme, following your description, there is no ordering guaranteed as opposed to /dev/sd* which would be a new problem requiring fix.

tlaurion avatar Mar 10 '25 18:03 tlaurion