xHCI reset timeout during s2idle resume of Raspberry Pi 4 B
Is this the right place for my bug report? It seems related to the VL805 firmware, so i'm not sure.
Describe the bug I'm doing some s2idle tests with the Raspberry Pi 4B. It seems to work except of xHCI (VIA VL805), which timeouts after xHCI reset command during the resume phase. Here is the kernel log with some additional log messages:
[47893.190601] PM: Triggering wakeup from IRQ 25
[47893.190622] PM: resume from suspend-to-idle
[47893.190761] brcm-pcie fd500000.pcie: brcm_pcie_resume_noirq
[47893.190767] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.190871] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.190876] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[47893.191086] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[47893.191311] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[47893.319207] brcm-pcie fd500000.pcie: clkreq-mode set to default
[47893.321263] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[47893.346468] PM: noirq resume of devices complete after 155.839 msecs
[47893.346795] PM: early resume of devices complete after 0.290 msecs
[47893.467752] bcmgenet fd580000.ethernet eth0: Link is Down
[47893.494488] raspberrypi-reset soc:firmware:reset: Notify xHCI reset
[47893.642237] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[47893.785051] brcmfmac: brcmf_c_process_txcap_blob: no txcap_blob available (err=-2)
[47893.785379] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Nov 1 2021 00:37:25 version 7.45.241 (1a2f2fa CY) FWID 01-703fd60
[47894.515224] usb usb1: root hub lost power or was reset
[47894.515235] usb usb2: root hub lost power or was reset
[47894.515239] xhci_hcd 0000:01:00.0: Stop HCD
[47894.515245] xhci_hcd 0000:01:00.0: HCD stopped
[47894.515252] xhci_hcd 0000:01:00.0: Reset the HC, CMD: 00000002
[47921.929950] xhci_hcd 0000:01:00.0: xhci_handshake_check_state failed: -110
[47921.930007] xhci_hcd 0000:01:00.0: Failed to reset: -110
[47921.930014] xhci_hcd 0000:01:00.0: PCI post-resume error -110!
[47921.930020] xhci_hcd 0000:01:00.0: HC died; cleaning up
[47921.930034] xhci_hcd 0000:01:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110
[47921.930054] xhci_hcd 0000:01:00.0: PM: failed to resume async: error -110
[47921.930128] PM: resume of devices complete after 28583.092 msecs
[47921.930540] OOM killer enabled.
[47921.930544] Restarting tasks ...
[47921.930586] usb 1-1: USB disconnect, device number 2
[47921.934894] done.
[47921.934924] random: crng reseeded on system resumption
[47921.941215] PM: suspend exit
How can i figure out that the VL805 firmware is really functional after raspberrypi-reset soc:firmware:reset: Notify xHCI reset ?
Is it possible that HCD stop cause this issue?
To reproduce
sudo su
echo enabled > /sys/class/tty/ttyS1/power/wakeup
echo freeze > /sys/power/state
# wait some seconds
# press key on console
Expected behaviour xHCI reset command is successful like during driver probe
Actual behaviour xHCI reset timeouts during resume, Heartbeat LED is blocked during this timeout
System
- Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW Raspberry Pi 4 Model B (without EEPROM for VL805)
- Which firmware version (
vcgencmd version)? 2024-09-13T15:58:42 - Which kernel version (
uname -a)? Mainline kernel Linux 6.12 ( see https://github.com/lategoodbye/linux-dev/commits/v6.12-pm_v2/ )
VL805 firmware: 000138a1
Important note: this issue is only reproducible with Raspberry Pi 4 boards without EEPROM for the VL805 firmware. The newer boards which have a EEPROM for the VL805 firmware are not affected by this issue:
[ 96.799927] PM: Triggering wakeup from IRQ 25
[ 96.799945] PM: resume from suspend-to-idle
[ 96.800057] brcm-pcie fd500000.pcie: brcm_pcie_resume_noirq
[ 96.800064] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[ 96.800169] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[ 96.800174] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[ 96.800386] brcm-pcie fd500000.pcie: brcm_pcie_bridge_sw_init_set_generic
[ 96.800612] brcm-pcie fd500000.pcie: brcm_pcie_perst_set_generic
[ 96.927459] brcm-pcie fd500000.pcie: clkreq-mode set to default
[ 96.929518] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[ 96.954725] PM: noirq resume of devices complete after 154.775 msecs
[ 96.955053] PM: early resume of devices complete after 0.287 msecs
[ 97.072080] bcmgenet fd580000.ethernet eth0: Link is Down
[ 97.072374] raspberrypi-reset soc:firmware:reset: Notify xHCI reset
[ 97.247036] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[ 97.390105] brcmfmac: brcmf_c_process_txcap_blob: no txcap_blob available (err=-2)
[ 97.390451] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Nov 1 2021 00:37:25 version 7.45.241 (1a2f2fa CY) FWID 01-703fd60
[ 98.083491] usb usb1: root hub lost power or was reset
[ 98.083506] usb usb2: root hub lost power or was reset
[ 98.083511] xhci_hcd 0000:01:00.0: Stop HCD
[ 98.083546] xhci_hcd 0000:01:00.0: HCD stopped
[ 98.083553] xhci_hcd 0000:01:00.0: Reset the HC, CMD: 00000002
[ 98.083675] xhci_hcd 0000:01:00.0: // Disabling event ring interrupts
[ 98.083682] xhci_hcd 0000:01:00.0: cleaning up memory
[ 98.083981] xhci_hcd 0000:01:00.0: xhci_stop completed - status = 11
[ 98.083987] xhci_hcd 0000:01:00.0: Initialize the xhci_hcd
[ 98.084282] xhci_hcd 0000:01:00.0: Start the primary HCD
[ 98.084442] xhci_hcd 0000:01:00.0: Start the secondary HCD
[ 98.084481] xhci_hcd 0000:01:00.0: xhci_resume: starting usb1 port polling.
[ 98.359660] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[ 98.608754] PM: resume of devices complete after 1653.699 msecs
[ 98.609184] OOM killer enabled.
[ 98.609189] Restarting tasks ... done.
[ 98.620417] random: crng reseeded on system resumption
[ 98.621329] PM: suspend exit
Important note: this issue is only reproducible with Raspberry Pi 4 boards without EEPROM for the VL805 firmware. The newer boards which have a EEPROM for the VL805 firmware are not affected by this issue:
You've got that backwards:
- Pi 4B rev 1.3 and earlier have two EEPROMs - one for bootloader, one for VL805.
- Pi 4B from rev 1.4 onwards have a single EEPROM that contains both bootloader and VL805 firmware.
So it sounds like it is actually the newer boards that suffer from this issue.
If memory serves, on boards without a separate EEPROM chip for the VL805 firmware, the VPU firmware running on the SoC (BCM2711) is responsible for sending the firmware to the VL805, so I'm guessing that after an xHCI reset the VL805 needs its firmware reloading. Which seems to be confimed by https://forums.raspberrypi.com/viewtopic.php?t=375483#p2246599 and https://forums.raspberrypi.com/viewtopic.php?t=317494#p1900532.
I suspect you need to make the mailbox call timg236 mentions in https://forums.raspberrypi.com/viewtopic.php?t=375483#p2246599. I think I've found the driver that does this at https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/reset/reset-raspberrypi.c.
@andrum993 Thanks you for the feedback. My problem is that most of my Raspberry Pi 4 boards are prototype boards. All i can say is that the affected PCB (bad case) hasn't a EEPROM (8 pins) assembled near the VL805 and the good case PCB has a EEPROM assembled.
You are correct regarding the VL805 firmware reloading process that the VPU is responsible and the reset-raspberrypi driver triggers this process. As you can see from the traces above the necessary mailbox call is successfully send in both cases (bad and good case):
[47893.494488] raspberrypi-reset soc:firmware:reset: Notify xHCI reset
I assume the call doesn't confirm that the VL805 firmware is actually uploaded, because of the sleeps in the reset driver. I tried to increase the sleep in reset-raspberrypi but it doesn't help. So that's the reason, why i asked how can i figure out that the VL805 firmware is actually loaded?
I found this helpful comment by @timg236
Before s2idle (bad case):
root@raspberrypi:/home/pi# lspci -d 1106:3483 -xxx
01:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
00: 06 11 83 34 46 05 10 00 01 30 03 0c 10 00 00 00
10: 04 00 00 f8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
30: 00 00 00 00 80 00 00 00 00 00 00 00 27 01 00 00
40: 00 00 00 00 00 01 00 00 09 10 00 40 04 00 00 00
50: c0 38 01 00 00 00 00 00 00 00 00 00 06 11 83 34
After s2idle (bad case):
root@raspberrypi:/sys/power# lspci -d 1106:3483 -xxx
01:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
00: 06 11 83 34 46 05 10 00 01 30 03 0c 10 00 00 00
10: 04 00 00 f8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
30: 00 00 00 00 80 00 00 00 00 00 00 00 27 01 00 00
40: 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
So it seems to me the VL805 firmware is not loaded.
Why do you need this to work on old prototype boards? I suggest you test each of the production board variants and if it works on those, then there isn't a problem.
Sorry, i spend now a lot of my spare time to upstream s2idle for Raspberry Pi boards since July 2024. The test feedback so far was very little and believe me as an ex kernel maintainer these weren't trivial issues. I tested it with 4 Raspberry Pi 4 and 3 of them showed this issue. So why should i buy a new one, while it's very likely to be a software issue?
I don't believe the VL805 chip ROM supports a firmware reload interface - at a minimum you'd have to do a PCIe fundamental reset, I don't know if that's guaranteed to fully reset the VL805 though. Running the VL805 w/o flash is not well supported by VIA so I'm not hopefully about this.
@timg236 Thanks for the hint. The DT binding / pcie driver defines possible 4 different types of reset (perst, rescal, bridge, swinit) and 3 regulators (vpcie3v3, vpcie3v3aux, vpcie12v), but none of them are defined in the RPi 4 DT.
Does this really represent the actual hardware? Which of them represent the mentioned "fundamental reset"?
I've seen in the CM4 datasheet there is a pin PCIe_nRST.
Is this pin connected to the VL805 in case of RPi 4? How is this pin controlled (VPU, ARM, PCIe IP, GPIO)?
I don't know the exact details of perst vs swinit but any wake up code will at least have to go through the sequence in brcm_pcie_setup, enumerate PCIe then do the XHCI reset.
https://github.com/raspberrypi/linux/blob/rpi-6.6.y/drivers/pci/controller/pcie-brcmstb.c#L1159
Okay, this sounds to me that i better start testing s2idle with a CM4 + a PCIe device. After this works flawless, i can continue with the RPi 4. Until now i only tested the CM4 without any PCIe device.
Chatting with others, I the VL805 without dedicated SPI flash is a special case because of the requirement to reload the XHCI firmware after PCIe is reset. I think the issue is that fundamental reset doesn't cause the VL805 ROM to fully reset it's internal state.
A CM4 with an NVMe device or even better an XHCI card would be a good starting point.
Here are my current test results: s2idle on CM4 without any PCIe endpoint = works s2idle on CM4 with NVMe = works s2idle on Rpi 4 with dedicated VL805 EEPROM = works s2idle on Rpi 4 without dedicated VL805 EEPROM = break xHCI modprobe -r pcie_brcmstb; modprobe pcie_brcmstb = recover xHCI after breakage modprove -r xhci_pci; modprobe xhci_pci = doesn't recover xHCI after breakage
@lategoodbye Recent firmware releases contain a workaround for the VL805 firmware loading that allows the firmware to be reloaded multiple times e.g. after a PCI reset. This might help resolve the issues that you are seeing with s2idle on Pi4.
@timg236 Thank you. Could you please point me which software part has been changed?
Currently i've have paused my development on s2idle for several reasons.
The normal design for VL805 is to have a dedicated SPI flash where the VL805 chip ROM is able to load the SRAM component automatically and page in the MCU firmware on demand.
If there is no dedicated SPI flash then the bootloader / bios must do the two stage load itself at the start of day or again if PCI is reset (e.g. if the memory map changes). There seems to be a problem where a PCI fundamental PCI reset does not fully reset the VL805 (without dedicated SPI flash) and the upload of the SRAM code fails. The workaround is to load a dummy firmware (all zeros), force a reset and then load the real firmware.
Unfortuantely, we have no documentation for this, but it does seem to correspond with the VL805 behaviour with dedicated SPI flash if you upgrade the SPI flash firmware without a reboot.
See: https://github.com/raspberrypi/firmware/commit/d828cc856bc81901a2a3fe5e1ad6231e72f21c97
Closing as fixed.