raspberry-pi-pcie-devices icon indicating copy to clipboard operation
raspberry-pi-pcie-devices copied to clipboard

Test Pineboards HatDrive! Dual NVMe Board

Open geerlingguy opened this issue 9 months ago • 18 comments

Pineboards' HatDrive! Dual NVMe Board includes an ASmedia ASM1182e PCIe Gen 2 switch chip to split one lane of PCIe Gen 2 from the Pi 5 to two 2230 or 2242 M.2 NVMe SSD M-key slots:

DSC05942

Since https://github.com/raspberrypi/firmware/issues/1833 is now in the Raspberry Pi's firmware (a sudo rpi-update should get the Pi 5 to the latest version), this board should also allow booting NVMe SSDs behind the PCIe switch!

geerlingguy avatar May 13 '24 19:05 geerlingguy

Using the latest firmware, I was able to boot off one of the two NVMe drives, though the bootup sequence takes like 30 seconds before it starts loading the OS (it only takes about 3-4 seconds with the drive connected directly). I don't think there are any other issues, but I'll try reseating everything. Maybe a power issue with the MakerDisk NVMe SSDs I'm currently testing?

Things are rather slow, and I'm seeing:

error reading input file: Input/output error

Along with some dmesg logs that would indicate some filesystem access issues.

Full system is only drawing 5-8W, so it seems like NVMe might not be pulling down more than the 5W or so the connector is rated for. I might swap out the drives for some Pinedrive drives.

It was slow with the boot NVMe in slot 2 (my 2nd drive was in slot 1).

geerlingguy avatar May 13 '24 19:05 geerlingguy

I first tried re-seating everything, and only running the boot NVMe in slot 2, with nothing in slot 1. It still takes a pretty long time before the kernel boot process starts up, and it's still going a bit slow. Going to try with the Pinedrive next, to see if it's the MakerDisk (which didn't have any issues connected directly), the firmware, or something else...

But first, one more boot with just one MakerDisk NVMe SSD in there...

geerlingguy avatar May 13 '24 20:05 geerlingguy

Got some timeouts in the logs during boot:

[    3.647613] could not locate request for tag 0x0
[    3.652656] nvme nvme0: invalid id 0 completed on queue 1
[   34.125313] nvme nvme0: I/O 902 (I/O Cmd) QID 1 timeout, aborting
[   34.176150] nvme nvme0: Abort status: 0x0
[   64.837298] nvme nvme0: I/O 902 QID 1 timeout, reset controller
[   64.941397] nvme0n1: I/O Cmd(0x2) @ LBA 1318912, 32 blocks, I/O Error (sct 0x3 / sc 0x71) 
[   64.950099] I/O error, dev nvme0n1, sector 1318912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[   64.987988] nvme nvme0: 4/0/0 default/read/poll queues
...
[   66.052071] could not locate request for tag 0x0
[   66.057275] nvme nvme0: invalid id 0 completed on queue 4
...
[   67.110524] could not locate request for tag 0x0
[   67.110533] nvme nvme0: invalid id 0 completed on queue 4
[   67.170655] nvme0n1: I/O Cmd(0x2) @ LBA 1087352, 144 blocks, I/O Error (sct 0x2 / sc 0x82) 
[   67.170665] protection error, dev nvme0n1, sector 1087352 op 0x0:(READ) flags 0x80700 phys_seg 18 prio class 2
[   67.170685] nvme0n1: I/O Cmd(0x2) @ LBA 1087232, 112 blocks, I/O Error (sct 0x2 / sc 0x82) 
[   67.170689] protection error, dev nvme0n1, sector 1087232 op 0x0:(READ) flags 0x80700 phys_seg 14 prio class 2
...
[   97.605320] nvme nvme0: I/O 128 (I/O Cmd) QID 4 timeout, aborting
[   97.649624] nvme nvme0: Abort status: 0x0
[  128.325309] nvme nvme0: I/O 128 QID 4 timeout, reset controller
[  128.425422] nvme0n1: I/O Cmd(0x2) @ LBA 1273320, 64 blocks, I/O Error (sct 0x3 / sc 0x71) 
[  128.425428] I/O error, dev nvme0n1, sector 1273320 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[  128.425439] nvme0n1: I/O Cmd(0x2) @ LBA 2614328, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 
[  128.425442] I/O error, dev nvme0n1, sector 2614328 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  128.439981] nvme nvme0: 4/0/0 default/read/poll queues

geerlingguy avatar May 13 '24 20:05 geerlingguy

When I tried running some disk benchmarks behind the switch:

[  289.638013] could not locate request for tag 0x0
[  289.638025] nvme nvme0: invalid id 0 completed on queue 3
[  289.927384] could not locate request for tag 0x0
[  289.927387] nvme nvme0: invalid id 0 completed on queue 3
[  289.927388] could not locate request for tag 0x0
[  289.927390] nvme nvme0: invalid id 0 completed on queue 3
[  289.927394] could not locate request for tag 0x0
[  289.927395] nvme nvme0: invalid id 0 completed on queue 3

Then after about 10 seconds, I start getting more errors and finally:

[  321.426665] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[  321.426671] nvme nvme0: Does your device have a faulty power saving mode enabled?
[  321.426674] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[  335.058607] nvme0n1: I/O Cmd(0x2) @ LBA 6303040, 32 blocks, I/O Error (sct 0x3 / sc 0x71) 
[  335.058616] I/O error, dev nvme0n1, sector 6303040 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  335.667605] nvme 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  337.982539] nvme nvme0: Disabling device after reset failure: -19
[  337.996995] I/O error, dev nvme0n1, sector 3221904 op 0x1:(WRITE) flags 0x9800 phys_seg 4 prio class 2
[  337.996998] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131631 starting block 1150070)
[  337.997006] Buffer I/O error on device nvme0n1p2, logical block 1017974
[  337.997008] Buffer I/O error on device nvme0n1p2, logical block 1017975
[  337.997009] Buffer I/O error on device nvme0n1p2, logical block 1017976
[  337.997008] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131642 starting block 665571)
[  337.997011] Buffer I/O error on device nvme0n1p2, logical block 1017977
[  337.997015] Buffer I/O error on device nvme0n1p2, logical block 533475
[  337.997017] Buffer I/O error on device nvme0n1p2, logical block 533476
[  337.997018] Buffer I/O error on device nvme0n1p2, logical block 533477
[  337.997019] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131631 starting block 1150122)
[  337.997020] Buffer I/O error on device nvme0n1p2, logical block 533478
[  337.997022] Buffer I/O error on device nvme0n1p2, logical block 1018026
[  337.997024] Buffer I/O error on device nvme0n1p2, logical block 1018027
[  337.997025] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131642 starting block 665599)
[  337.997027] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131631 starting block 1150150)
[  337.997029] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131642 starting block 665635)
[  337.997032] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131631 starting block 1150262)
[  337.997034] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131642 starting block 665639)
[  337.997036] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131631 starting block 1150298)
[  337.997039] EXT4-fs warning (device nvme0n1p2): ext4_end_bio:343: I/O error 10 writing to inode 131642 starting block 706048)
[  337.997042] JBD2: Detected IO errors while flushing file data on nvme0n1p2-8
[  337.997047] Aborting journal on device nvme0n1p2-8.
[  337.997055] Buffer I/O error on dev nvme0n1p2, logical block 262144, lost sync page write
[  337.997059] JBD2: I/O error when updating journal superblock for nvme0n1p2-8.
[  337.997205] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:84: comm iozone: Detected aborted journal
[  337.998508] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:84: comm systemd-journal: Detected aborted journal
[  337.998588] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:84: comm journal-offline: Detected aborted journal
[  337.998597] Buffer I/O error on dev nvme0n1p2, logical block 0, lost sync page write
[  337.998600] EXT4-fs (nvme0n1p2): I/O error while writing superblock
[  337.998602] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[  337.998603] EXT4-fs (nvme0n1p2): ext4_do_writepages: jbd2_start: 9223372036854775741 pages, ino 131642; err -30
[  338.008003] Buffer I/O error on dev nvme0n1p2, logical block 0, lost sync page write
[  338.047783] EXT4-fs (nvme0n1p2): I/O error while writing superblock
[  338.047788] Buffer I/O error on dev nvme0n1p2, logical block 0, lost sync page write
[  338.047792] EXT4-fs (nvme0n1p2): I/O error while writing superblock

geerlingguy avatar May 13 '24 20:05 geerlingguy

Swapped in the Pineboard. First boot on a freshly-flashed Pi OS Lite did the normal cycle, but second boot was being weird.

Spying on things with my Debug Probe, I see:

Boot mode: NVME (06) order f14
PCI1 init
PCI1 reset
PCIe scan 00001b21:00001182
PCIe scan 00001b21:00001182
PCIe scan 00001b21:00001182
PCIe scan 00001b21:00001182
PCIe scan 00001b21:00001182
PCIe scan 00001987:00005013
Found device on bridge 4 with vid 1987 enable 1
PCIe scan 00001b21:00001182
PCIe scan 00001987:00005013
VID 0x1987 MN SSD94-2563CG-PB                        
NVME on 0
MBR: 0x00002000, 1048576 type: 0x0c
MBR: 0x00102000,499061424 type: 0x83
MBR: 0x00000000,       0 type: 0x00
MBR: 0x00000000,       0 type: 0x00
Trying partition: 0
type: 32 lba: 8192 'mkfs.fat' ' bootfs     ' clusters 261116 (4)
rsc 32 fat-sectors 2040 root dir cluster 2 sectors 0 entries 0
FAT32 clusters 261116
[nvme] autoboot.txt not found
Select partition rsts 0 C(boot_partition) 0 EEPROM config 0 result 0
Trying partition: 0
type: 32 lba: 8192 'mkfs.fat' ' bootfs     ' clusters 261115 (4)
rsc 32 fat-sectors 2040 root dir cluster 2 sectors 0 entries 0
FAT32 clusters 261115
Read config.txt bytes     1213 hnd 0x199
[nvme] pieeprom.upd not found
usb_max_current_enable default 0 max-current 5000
Read bcm2712-rpi-5-b.dtb bytes    77755 hnd 0xe6
dt-match: compatible: raspberrypi,5-model-b match: brcm,bcm2712
dt-match: compatible: brcm,bcm2712 match: brcm,bcm2712

NOTICE:  BL31: v2.6(release):v2.6-239-g2a9ede0bd
NOTICE:  BL31: Built : 14:26:57, Jun 22 2023
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x414fd0b1]

Then after a few seconds, it starts stalling out with nvme errors:

[    4.513325] systemd[1]: Started systemd-journald.service - Journal Service.
[    4.612647] could not locate request for tag 0x0
[    4.617854] nvme nvme0: invalid id 0 completed on queue 1
[   36.759208] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[   36.768075] nvme nvme0: Does your device have a faulty power saving mode enabled?
[   36.776147] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[   50.418384] nvme0n1: I/O Cmd(0x2) @ LBA 1273448, 136 blocks, I/O Error (sct 0x3 / sc 0x71) 
[   50.427333] I/O error, dev nvme0n1, sector 1273448 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 2
[   50.427345] nvme0n1: I/O Cmd(0x2) @ LBA 2668112, 64 blocks, I/O Error (sct 0x3 / sc 0x71) 
[   50.446109] I/O error, dev nvme0n1, sector 2668112 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[   50.446145] nvme0n1: I/O Cmd(0x2) @ LBA 2667024, 64 blocks, I/O Error (sct 0x3 / sc 0x71) 
[   50.446147] nvme0n1: I/O Cmd(0x2) @ LBA 1057840, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 
[   50.446153] I/O error, dev nvme0n1, sector 1057840 op 0x0:(READ) flags 0x83700 phys_seg 1 prio class 2
[   50.446157] I/O error, dev nvme0n1, sector 2667024 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[   50.492912] nvme0n1: I/O Cmd(0x2) @ LBA 1057888, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 

Maybe I just have a bad board? I'll try again with a HatDrive Commander or whatever the PCIe switch board is, to see if that's the issue.

geerlingguy avatar May 13 '24 20:05 geerlingguy

On a separate HatDrive! Commander board (which I've tested with a SATA controller + 2.5G Ethernet in my first SATA Pi NAS build—before I went for a 2.5G USB Ethernet adapter), I am getting the same problem...

[    3.342066] Run /init as init process
[    3.370367] mmc1: new ultra high speed DDR50 SDIO card at address 0001
[    3.605276] could not locate request for tag 0x0
[    3.610316] nvme nvme0: invalid id 0 completed on queue 1
[   34.117399] nvme nvme0: I/O 64 (I/O Cmd) QID 1 timeout, aborting
[   34.168148] nvme nvme0: Abort status: 0x0

It seems to have issues a few seconds into boot, and then it stalls out. If I leave it for a really long time it eventually boots. But it's highly unstable, and running a disk benchmark eventually crashes the thing with tons of NVMe errors.

Going to try digging up another PCIe switch to see if that fixes anything.

geerlingguy avatar May 13 '24 20:05 geerlingguy

Same thing with 3rd PCIe switch. And the 2nd and 3rd are both switches I have never had an issue with in the past :/

[    3.373708] mmc1: new ultra high speed DDR50 SDIO card at address 0001
[    3.605280] could not locate request for tag 0x0
[    3.610319] nvme nvme0: invalid id 0 completed on queue 3
[   34.117406] nvme nvme0: I/O 128 (I/O Cmd) QID 3 timeout, aborting
[   34.168243] nvme nvme0: Abort status: 0x0

geerlingguy avatar May 13 '24 20:05 geerlingguy

Just to reset my sanity, I replugged everything direct (dropping the switch out of the equation).

It boots up with no problem whatsoever, and the filesystem mounts right away, nicely:

[    3.205616] mmc1: new ultra high speed DDR50 SDIO card at address 0001
[    3.582639] EXT4-fs (nvme0n1p2): mounted filesystem 93c89e92-8f2e-4522-ad32-68faed883d2f ro with ordered data mode. Quota mode: none.
[    3.704346] NET: Registered PF_INET6 protocol family

Disk benchmarks run as normal, so either there's a firmware issue with NVMe boot behind the switch... or... something. Maybe I should try some other NVMe SSDs! (I've tried two: Cytron MakerDisk 2242 and Pineboards NVMe 2242).

geerlingguy avatar May 13 '24 20:05 geerlingguy

I don't think it's a power issue, but never say never.

Screenshot 2024-05-13 at 3 38 30 PM

geerlingguy avatar May 13 '24 20:05 geerlingguy

Did you disable aspm? Ive had someone with two HatDrive bottoms and the commander and it solved the PCIe errors

mikegapinski avatar May 13 '24 20:05 mikegapinski

I'm benchmarking the drive with fio and it seems to be working okay when I just have one drive, unformatted, running in the carrier. Going to reboot a couple more times and see if it's stable.

geerlingguy avatar May 13 '24 20:05 geerlingguy

If you want to rule out power try a bottom with a commander. You can connect the power header to GPIO and it'll offload the ribbon cable.

I doubt that's it, the ribbon can handle 4 drives without issues 😅

mikegapinski avatar May 13 '24 21:05 mikegapinski

Yeah I know lol. I will try that at some point, but going to try pcie_aspm=off in /boot/firmware/cmdline.txt next. Just need to try booting into NVMe so I can edit it. Or maybe pull the drive, connect to my Mac, edit the file, pop it back on, lol.

Someone needs to make a nice pop-in-pop-out M.2 slot for quicker swaps :)

geerlingguy avatar May 13 '24 21:05 geerlingguy

I pulled the drive, added pcie_aspm=off to cmdline.txt over on my Mac, then rebooted...

And now it's booting up perfectly, every time. Wouldn't you know!?

On the downside—fio's only giving me 215 MiB/sec now, it was giving me 450 MiB/sec with just one drive on the bus before...? lspci shows:

		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

And in dmesg:

[    1.933020] pci 0000:02:03.0: [1b21:1182] type 01 class 0x060400
[    1.939519] pci 0000:02:03.0: enabling Extended Tags
[    1.944974] pci 0000:02:03.0: PME# supported from D0 D3hot D3cold
[    1.951687] pci 0000:02:07.0: [1b21:1182] type 01 class 0x060400
[    1.958175] pci 0000:02:07.0: enabling Extended Tags
[    1.963623] pci 0000:02:07.0: PME# supported from D0 D3hot D3cold
[    1.970859] pci 0000:02:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.979318] pci 0000:02:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.987840] pci 0000:03:00.0: [1987:5013] type 00 class 0x010802
[    1.994304] pci 0000:03:00.0: reg 0x10: [mem 0x1b00000000-0x1b00003fff 64bit]
[    2.002122] pci 0000:03:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[    2.017666] pci_bus 0000:03: busn_res: [bus 03-04] end is updated to 03
[    2.024811] pci 0000:04:00.0: [1987:5013] type 00 class 0x010802
[    2.031287] pci 0000:04:00.0: reg 0x10: [mem 0x1b00000000-0x1b00003fff 64bit]
[    2.039118] pci 0000:04:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:02:07.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)

geerlingguy avatar May 13 '24 21:05 geerlingguy

ASPM could also be disabled with a dtoverlay; I'll see if I can figure it out tomorrow.

I don't know what's wrong with the aspm implementation though, I also saw people disable it for Coral.

In 95% of all the issues I've been dealing with supporting our users in the past few months it makes all the pcie error go away.

mikegapinski avatar May 13 '24 21:05 mikegapinski

It's now on the site, btw: https://pipci.jeffgeerling.com/hats/pineboards-hatdrive-dual-nvme-m2.html

geerlingguy avatar May 13 '24 21:05 geerlingguy

@mikegapinski mentioned over in the pi firmware issue that the phison controller doesn't play nice with power states behind the bridge, and recommended:

  1. Edit /boot/firmware/config.txt
  2. Add the line dtparam=pciex1_no_l0s=on
  3. Save and reboot

Now I'm getting PCIe Gen 2 speeds, and no crashing or hanging issues:

pi@pi5test:~ $ sudo fio fio.fio
nvme0: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=32
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=435MiB/s][r=1741 IOPS][eta 00m:00s]
nvme0: (groupid=0, jobs=1): err= 0: pid=920: Tue May 14 17:13:19 2024
  read: IOPS=1739, BW=435MiB/s (456MB/s)(12.8GiB/30019msec)
    slat (nsec): min=5036, max=67483, avg=5780.38, stdev=1710.71
    clat (usec): min=18114, max=35613, avg=18385.56, stdev=244.30
     lat (usec): min=18135, max=35618, avg=18391.34, stdev=244.29
    clat percentiles (usec):
     |  1.00th=[18482],  5.00th=[18482], 10.00th=[18482], 20.00th=[18482],
     | 30.00th=[18482], 40.00th=[18482], 50.00th=[18482], 60.00th=[18482],
     | 70.00th=[18482], 80.00th=[18482], 90.00th=[18482], 95.00th=[18482],
     | 99.00th=[18482], 99.50th=[18482], 99.90th=[18482], 99.95th=[20579],
     | 99.99th=[32637]
   bw (  KiB/s): min=436736, max=446464, per=100.00%, avg=445801.33, stdev=1228.82, samples=60
   iops        : min= 1706, max= 1744, avg=1741.27, stdev= 4.77, samples=60
  lat (msec)   : 20=100.01%, 50=0.05%
  cpu          : usr=0.45%, sys=1.54%, ctx=53738, majf=0, minf=9
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=52211,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=435MiB/s (456MB/s), 435MiB/s-435MiB/s (456MB/s-456MB/s), io=12.8GiB (13.7GB), run=30019-30019msec

Disk stats (read/write):
  nvme1n1: ios=60595/0, merge=0/0, ticks=1127880/0, in_queue=1127880, util=100.00%

geerlingguy avatar May 14 '24 22:05 geerlingguy

Just because people have been asking a bit... power consumption stats with two Pinedrive 2242 NVMe SSDs installed:

  • Idle: 4.4W
  • fio against one drive: 5.7W (brief peak at 6.7W)
  • fio against two drives: 7W (brief peak at 8.3W)
Screenshot 2024-05-14 at 5 24 54 PM

During the two-drive testing, each drive individually pegged at 216 MB/sec, which illustrates a tiny bit of slowdown when accessing both drives, like in RAID.

geerlingguy avatar May 14 '24 22:05 geerlingguy

@geerlingguy

I added a kernel patch that disables ASPM without having to use any overlays: https://github.com/raspberrypi/linux/pull/6229

If you ever stumble upon a drive that is throwing errors like this in Gen 3 or behind a switch write down the ID from lspci -nn. Adding this quirk to more drives takes 5 seconds :)

mikegapinski avatar Jun 18 '24 11:06 mikegapinski