linux icon indicating copy to clipboard operation
linux copied to clipboard

[PI5][6.6.9]Boot timing issue M.2 USB

Open klslz opened this issue 1 year ago • 20 comments

Describe the bug

I just installed a Pineberry M.2 NVME HAT. I have a fully up2date RPiOS preinstalled on the SSD. It boots fine with known and documented configuration from the M.2 SSD.

However. I stepped over an issue related to the bootorder as being configured in rpi-eeprom

Default config: NVME first

BOOT_ORDER=0xf146 Test OK!

I then tried to boot from USB to access the M.2 SSD from the USB-OS.

BOOT_ORDER=0xf164 Test FAILED!

It still booted the M.2!!! first. Probably a timing issue, because

BOOT_ORDER=0xf614 TEST OK!

USB SSD was booted. Having the SD card in, probably delayed the time to get to the M.2.

This could be a (timing) flaw. Is there a parameter that can change the timing?

Steps to reproduce the behaviour

As described in above description

Device (s)

Other

System

Kernel: 6.6.9 Firmware: Thu 14 Dec 16:43:25 UTC 2023 (1702572205)

Logs

On request

Additional context

No response

klslz avatar Jan 08 '24 10:01 klslz

Additional info:

With a bit more of testing I figured that the problematic boot order of 0xf164 turns out to be a random "First Come First Serve" boot scenario. It can also happen that USB gets booted first!

klslz avatar Jan 08 '24 12:01 klslz

Additional info:

I just booted a 0xf146 scenario with USB and NVME drives attached:

` #lsblk -l

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 1.8T 0 disk sda1 8:1 0 512M 0 part /boot/firmware sda2 8:2 0 9.8G 0 part nvme0n1 259:0 0 1.8T 0 disk nvme0n1p1 259:1 0 512M 0 part nvme0n1p2 259:2 0 9.8G 0 part /

`

As you can see both drives are now being used. Something is not quite right here.

Note: If I boot USB with 0xf614 the M.2 gets not at all initialised. The system won't see it. It doesn't initialize it. That might be another issue. But there's at least no weird shared mounting happening.

klslz avatar Jan 08 '24 12:01 klslz

Background: I do think the issue is quite problematic. If you happen to run a M.2 NVME SSD setup I'd guess people (including me) prefer to boot from USB SSD to backup and restore the M.2 SSD. You simply don't want to unmount that NVME SSD from the carrier board all the time. That's why the scenario of having two boot SSDs attached should work reliable.

klslz avatar Jan 08 '24 12:01 klslz

As you can see both drives are now being used. Something is not quite right here.

It sounds like you did a straight copy of the SD to the SSD but the partition UUIDs weren't changed. Run "blkid" and check the PARTUUID values. They have to be unique or you get the issues you're seeing where the wrong device gets mounted.

trejan avatar Jan 08 '24 13:01 trejan

As a matter of fact. I always use a master image (pi-gen). I'll check the UUIDs out. ( I do not use SD cards btw)

Though I guess this won't explain the issue that the timing does not seem to be right and that I do not see the NVME SSD if I boot straight from USB-SSD.

klslz avatar Jan 08 '24 14:01 klslz

Just checked UUID and PARTUUID are the same on both devices.

klslz avatar Jan 08 '24 14:01 klslz

As a matter of fact. I always use a master image (pi-gen). I'll check the UUIDs out. ( I do not use SD cards btw)

Sorry. Not sure why I thought SD.

Though I guess this won't explain the issue that the timing does not seem to be right

It is due to timing but it isn't a bug. The kernel will mount the first partition it finds with the specified UUID as it expects them to be unique. The enumeration order may change due to minor delays during initialisation etc...

trejan avatar Jan 08 '24 14:01 trejan

Of course. The root PARTUUID gets specified in the cmdline.txt. This will cause trouble if two drives with the same IDs are connected. But it's gotta be UUID AND PARTUUID that needs to differ, doesn't it. I do some more testing and see what happens. I'll report back.

Any idea why the M.2 NVME is not shown - as a device - at all if connected but not booted?

klslz avatar Jan 08 '24 14:01 klslz

So. I tested it. I changed all the UUIDs and related configs on the USB drive. After that everything seems to work fine. I did some reboots and power off/on. Great. Sorry for the confusion.

However. One issue remains though. When cold-booting USB (power up) the PI ( 0xf164) the Nvme doesn't get detected. Which is a bummer. There's no easy way to backup/restore the NVME SSD.

I did one test, where it worked though. I booted the NVME, changed the config from 0xf146 to 0xf164 and just ran a warm-reboot. Only in this case I was booted on USB and the NVME devices were also listed.

Does this make sense?

klslz avatar Jan 08 '24 17:01 klslz

However. One issue remains though. When cold-booting USB (power up) the PI ( 0xf164) the Nvme doesn't get detected. Which is a bummer. There's no easy way to backup/restore the NVME SSD.

It works for me with a Pimoroni NVMe base. The NVMe drive is still visible after a cold boot when USB booting.

I think your issues are because the Pinedrive doesn't support the Pi PCIe addon autodetection mechanism. The Pineberry documentation mentions needing to manually enable the PCIe interface with a config.txt change when not booting from a NVMe drive. The NVMe Base does support the autodetection mechanism and doesn't need the interface to be manually enabled or PCIE_PROBE in the bootloader EEPROM config.

trejan avatar Jan 08 '24 18:01 trejan

Yep I know. I had

dtparam=pciex1

as alias for

dtparam=nvme

as described in the RPi docs.

It's working if you boot from NVME. But if you boot from USB and try to mount the NVME the device or better the external PCIe lane is not accessible.

I am not sure if this dmesg output

[ 0.396168] brcm-pcie 1000110000.pcie: link down

tells us something. It's not there if NVME is booted.

klslz avatar Jan 09 '24 07:01 klslz

[ 0.396168] brcm-pcie 1000110000.pcie: link down

That looks like a hardware problem - without a PCIe link it's game over. A flaky link would explain the sometimes NVME, sometimes USB behaviour you were getting before you changed the UUID.

pelwell avatar Jan 09 '24 09:01 pelwell

Yep. But. That somehow doesn't explain why the whole thing is dead stable if booted from NVME. No flakiness, no PCIe problems at all.

klslz avatar Jan 09 '24 09:01 klslz

And the USB drive is also attached in that NVME boot scenario. Just in case there are thoughts about e.g. power issues.

klslz avatar Jan 09 '24 09:01 klslz

I think I found something.

When removing

PCIE_PROBE=1

from the eeprom config it seems to work.

klslz avatar Jan 09 '24 10:01 klslz

Just to emphasize on my finding:

If the eeprom config contains:

BOOT_ORDER=0xf146 PCIE_PROBE=1

I can boot from NMVE, and see the attached USB drive

If the eeprom config contains:

BOOT_ORDER=0xf164 PCIE_PROBE=1

I can boot from USB, but the PCIe link is down - no access to NVME

If the eeprom config contains:

BOOT_ORDER=0xf164

I can boot from USB and can also access the NVME

Now: Do you consider this the expected behaviour?

Which would mean that I would have to run eeprom config changes any time I'd like to backup/restore/update my NVME image.

klslz avatar Jan 10 '24 12:01 klslz

If the eeprom config contains:

BOOT_ORDER=0xf164 PCIE_PROBE=1

I can boot from USB, but the PCIe link is down - no access to NVME

Works for me with a NVMe Base / a HAT Drive Top and a KIOXIA SSD.

$ vcgencmd bootloader_config
[all]
BOOT_UART=1
POWER_OFF_ON_HALT=0
BOOT_ORDER=0xf164
PCIE_PROBE=1

$ lspci
0000:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries Device 2712 (rev 21)
0000:01:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD (rev 01)
0001:00:00.0 PCI bridge: Broadcom Inc. and subsidiaries Device 2712 (rev 21)
0001:01:00.0 Ethernet controller: Device 1de4:0001
$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0 931.5G  0 disk 
├─sda1        8:1    0   512M  0 part /boot/firmware
└─sda2        8:2    0   931G  0 part /
nvme0n1     259:0    0 465.8G  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part 
└─nvme0n1p2 259:2    0 465.3G  0 part 

trejan avatar Jan 12 '24 16:01 trejan

i suspect this issue is causing me issues booting to mt geekworm X1003. i'm on my third brand of nvme trying to outsmart this.

etyrnal avatar Feb 22 '24 02:02 etyrnal

Have you got the latest bootloader on the Pi5? There has been a fix for boot timing in that.

JamesH65 avatar Feb 22 '24 08:02 JamesH65

Have you got the latest bootloader on the Pi5? There has been a fix for boot timing in that.

is that the same as the firmware? The terminology is confusing. I have the latest

# rpi-eeprom-config 
[all]
BOOT_UART=1
PCIE_PROBE=1
BOOT_ORDER=0xf146
POWER_OFF_ON_HALT=0
# rpi-eeprom-update 
BOOTLOADER: up to date
   CURRENT: Wed Feb 14 07:17:42 UTC 2024 (1707895062)
    LATEST: Wed Feb 14 07:17:42 UTC 2024 (1707895062)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader-2712/default)
            Use raspi-config to change the release.

i manually installed the bootloader since i'm on ubuntu 23 and it seemed to be behind on version.

etyrnal avatar Feb 22 '24 15:02 etyrnal