Unexpected behavior of mailbox property "set power/domain state"
Describe the bug At first, i'm aware that suspend to idle isn't officially supported. Today i was playing around with suspend to idle on my Raspberry Pi 1B and noticed some strange behavior of mailbox property "set power/domain state" which is used by raspberry-power driver. The driver doesn't set the wait bit for the state. Since the properties are resume are replied it's not a firmware crash.
Does the driver have to care about the order while powering down a domain? Did i missed something obvious?
To reproduce Add "no_console_suspend" to cmdline.txt and reboot Connect via Debug UART echo 1 > /sys/power/pm_debug_messages echo platform > /sys/power/pm_test echo freeze > /sys/power/state
Expected behaviour The power domains for USB, V3D, HDMI and VEC are powered down without timeout.
Actual behaviour During entering of suspend state all mailbox properties timed out.
System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:
- Which model of Raspberry Pi? Raspberry Pi 1B
- Which firmware version (
vcgencmd version)? 2023-03-17T10:52:42 - Which kernel version (
uname -a)? 6.9.3
Logs
[ 302.092009] PM: suspend entry (s2idle)
[ 302.121449] Filesystems sync: 0.029 seconds
[ 302.184627] Freezing user space processes
[ 302.204924] Freezing user space processes completed (elapsed 0.020 seconds)
[ 302.205016] OOM killer disabled.
[ 302.205058] Freezing remaining freezable tasks
[ 302.206591] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 302.215584] smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
[ 302.267282] PM: suspend of devices complete after 59.833 msecs
[ 302.267377] PM: start suspend of devices complete after 60.681 msecs
[ 302.269176] PM: late suspend of devices complete after 1.754 msecs
[ 303.284099] ------------[ cut here ]------------
[ 303.284155] WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128 rpi_firmware_property_list+0x204/0x22c
[ 303.284256] Firmware transaction 0x00028001 timeout
[ 303.284273] Modules linked in: bcm2835_v4l2(C) videobuf2_vmalloc videobuf2_memops bcm2835_mmal_vchiq(C) videobuf2_v4l2 videobuf2_common snd_bcm2835(C) raspberrypi_hwmon bcm2835_rng rng_core vchiq(C) configfs
[ 303.284438] CPU: 0 PID: 438 Comm: bash Tainted: G C 6.9.3-dirty #17
[ 303.284474] Hardware name: BCM2835
[ 303.284493] Call trace:
[ 303.284522] unwind_backtrace from show_stack+0x18/0x1c
[ 303.284600] show_stack from dump_stack_lvl+0x34/0x44
[ 303.284680] dump_stack_lvl from __warn+0x88/0xec
[ 303.284764] __warn from warn_slowpath_fmt+0x7c/0xb0
[ 303.284820] warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c
[ 303.284886] rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c
[ 303.284943] rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0
[ 303.285014] rpi_firmware_set_power from _genpd_power_off+0xe4/0x148
[ 303.285082] _genpd_power_off from genpd_sync_power_off+0x7c/0x11c
[ 303.285143] genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0
[ 303.285203] genpd_finish_suspend from dpm_run_callback+0x78/0xd0
[ 303.285279] dpm_run_callback from device_suspend_noirq+0xc0/0x238
[ 303.285339] device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168
[ 303.285396] dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac
[ 303.285471] suspend_devices_and_enter from pm_suspend+0x254/0x2e4
[ 303.285525] pm_suspend from state_store+0xa8/0xd4
[ 303.285576] state_store from kernfs_fop_write_iter+0x154/0x1a0
[ 303.285646] kernfs_fop_write_iter from vfs_write+0x12c/0x184
[ 303.285713] vfs_write from ksys_write+0x78/0xc0
[ 303.285765] ksys_write from ret_fast_syscall+0x0/0x54
[ 303.285816] Exception stack(0xcc93dfa8 to 0xcc93dff0)
[ 303.285857] dfa0: 00000001 b6fb6500 00000001 002d8c98 00000007 00000000
[ 303.285892] dfc0: 00000001 b6fb6500 b6f18d00 00000004 b6f16c18 b6f16780 00000000 00000000
[ 303.285920] dfe0: 0000000a befca348 b6e92530 b6e9254c
[ 303.285943] ---[ end trace 0000000000000000 ]---
[ 303.285982] rpi_firmware_set_power: Failed to set USB to 0 (-110)
[ 304.324164] rpi_firmware_set_power: Failed to set V3D to 0 (-110)
[ 305.364094] rpi_firmware_set_power: Failed to set HDMI to 0 (-110)
[ 305.364991] PM: noirq suspend of devices complete after 3095.584 msecs
[ 305.365052] PM: suspend debug: Waiting for 5 second(s).
[ 310.362303] hwmon hwmon1: Failed to get throttled (-110)
[ 311.444120] rpi_firmware_set_power: Failed to set VEC to 1 (-110)
[ 311.444823] PM: noirq resume of devices complete after 1082.729 msecs
[ 311.448333] PM: early resume of devices complete after 3.176 msecs
[ 311.761442] PM: resume of devices complete after 313.013 msecs
[ 311.763651] OOM killer enabled.
[ 311.763705] Restarting tasks ...
[ 311.809353] rpi_firmware_set_power: Set VEC to 0
[ 311.810399] smsc95xx 1-1.1:1.0 eth0: Link is Down
[ 311.811184] done.
[ 311.812924] random: crng reseeded on system resumption
[ 322.404088] vc4-drm soc:gpu: [drm] *ERROR* [CRTC:93:pixelvalve-2] flip_done timed out
[ 322.438600] PM: suspend exit
[ 353.124374] vc4-drm soc:gpu: [drm] User-defined mode not supported: "1920x1200": 60 154000 1920 1968 2000 2080 1200 1203 1209 1235 0x68 0x9
[ 363.364073] vc4-drm soc:gpu: [drm] *ERROR* flip_done timed out
[ 363.364157] vc4-drm soc:gpu: [drm] *ERROR* [CRTC:93:pixelvalve-2] commit wait timed out
[ 373.614113] vc4-drm soc:gpu: [drm] *ERROR* flip_done timed out
[ 373.614191] vc4-drm soc:gpu: [drm] *ERROR* [CONNECTOR:32:HDMI-A-1] commit wait timed out
[ 383.844240] vc4-drm soc:gpu: [drm] *ERROR* flip_done timed out
[ 383.844320] vc4-drm soc:gpu: [drm] *ERROR* [PLANE:85:plane-3] commit wait timed out
[ 394.084098] vc4-drm soc:gpu: [drm] *ERROR* flip_done timed out
[ 394.084161] vc4-drm soc:gpu: [drm] *ERROR* Timed out waiting for commit
[ 394.119429] rpi_firmware_set_power: Set VEC to 1
[ 394.119618] rpi_firmware_set_power: Set HDMI to 0
[ 404.324295] rpi_firmware_set_power: Set HDMI to 1
Okay, i found one issue in the bcm2835-mailbox.c. The IRQ of the mailbox driver will be disabled during noirq suspend. So i used the IRQF_NO_SUSPEND flag for the IRQ request in order to prevent this. This fixed the timeouts during suspend, but now there is a timeout in resume:
[ 366.125292] PM: suspend entry (s2idle)
[ 366.146284] Filesystems sync: 0.020 seconds
[ 366.201160] Freezing user space processes
[ 366.208973] Freezing user space processes completed (elapsed 0.007 seconds)
[ 366.209054] OOM killer disabled.
[ 366.209074] Freezing remaining freezable tasks
[ 366.210575] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 366.217853] smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
[ 366.267317] PM: suspend of devices complete after 56.053 msecs
[ 366.267400] PM: start suspend of devices complete after 56.739 msecs
[ 366.268873] PM: late suspend of devices complete after 1.434 msecs
[ 366.271165] rpi_firmware_set_power: Set USB to 0
[ 366.271476] rpi_firmware_set_power: Set V3D to 0
[ 366.271654] rpi_firmware_set_power: Set VEC to 0
[ 366.272440] PM: noirq suspend of devices complete after 3.373 msecs
[ 366.272497] PM: suspend debug: Waiting for 5 second(s).
[ 371.275217] rpi_firmware_set_power: Set HDMI to 1
[ 371.275426] rpi_firmware_set_power: Set VEC to 1
[ 371.275648] rpi_firmware_set_power: Set V3D to 1
[ 372.324439] ------------[ cut here ]------------
[ 372.324493] WARNING: CPU: 0 PID: 22 at drivers/gpu/drm/vc4/vc4_hdmi.c:474 vc4_hdmi_connector_detect_ctx+0x4c/0x29c
[ 372.324581] Modules linked in: bcm2835_v4l2(C) videobuf2_vmalloc videobuf2_memops bcm2835_mmal_vchiq(C) videobuf2_v4l2 videobuf2_common snd_bcm2835(C) raspberrypi_hwmon bcm2835_rng rng_core vchiq(C) configfs
[ 372.324734] CPU: 0 PID: 22 Comm: kworker/0:1 Tainted: G C 6.9.3-dirty #19
[ 372.324769] Hardware name: BCM2835
[ 372.324793] Workqueue: events output_poll_execute
[ 372.324859] Call trace:
[ 372.324886] unwind_backtrace from show_stack+0x18/0x1c
[ 372.324949] show_stack from dump_stack_lvl+0x34/0x44
[ 372.325020] dump_stack_lvl from __warn+0x88/0xec
[ 372.325096] __warn from warn_slowpath_fmt+0x7c/0xb0
[ 372.325147] warn_slowpath_fmt from vc4_hdmi_connector_detect_ctx+0x4c/0x29c
[ 372.325203] vc4_hdmi_connector_detect_ctx from drm_helper_probe_detect_ctx+0x68/0x124
[ 372.325269] drm_helper_probe_detect_ctx from output_poll_execute+0x144/0x208
[ 372.325326] output_poll_execute from process_scheduled_works+0x194/0x288
[ 372.325388] process_scheduled_works from worker_thread+0x1dc/0x23c
[ 372.325435] worker_thread from kthread+0xc4/0xd0
[ 372.325482] kthread from ret_from_fork+0x14/0x28
[ 372.325524] Exception stack(0xcc865fb0 to 0xcc865ff8)
[ 372.325557] 5fa0: 00000000 00000000 00000000 00000000
[ 372.325590] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 372.325620] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 372.325642] ---[ end trace 0000000000000000 ]---
[ 372.325754] ------------[ cut here ]------------
[ 372.325785] WARNING: CPU: 0 PID: 430 at drivers/firmware/raspberrypi.c:128 rpi_firmware_property_list+0x204/0x22c
[ 372.325863] Firmware transaction 0x00028001 timeout
[ 372.325879] Modules linked in: bcm2835_v4l2(C) videobuf2_vmalloc videobuf2_memops bcm2835_mmal_vchiq(C) videobuf2_v4l2 videobuf2_common snd_bcm2835(C) raspberrypi_hwmon bcm2835_rng rng_core vchiq(C) configfs
[ 372.326030] CPU: 0 PID: 430 Comm: bash Tainted: G WC 6.9.3-dirty #19
[ 372.326063] Hardware name: BCM2835
[ 372.326081] Call trace:
[ 372.326100] unwind_backtrace from show_stack+0x18/0x1c
[ 372.326162] show_stack from dump_stack_lvl+0x34/0x44
[ 372.326231] dump_stack_lvl from __warn+0x88/0xec
[ 372.326305] __warn from warn_slowpath_fmt+0x7c/0xb0
[ 372.326357] warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c
[ 372.326414] rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c
[ 372.326469] rpi_firmware_property from rpi_firmware_set_power+0x5c/0xc8
[ 372.326535] rpi_firmware_set_power from _genpd_power_on+0xe0/0x148
[ 372.326589] _genpd_power_on from genpd_sync_power_on+0x48/0xac
[ 372.326647] genpd_sync_power_on from genpd_finish_resume+0x8c/0x104
[ 372.326701] genpd_finish_resume from dpm_run_callback+0x78/0xd0
[ 372.326769] dpm_run_callback from device_resume_noirq+0x128/0x1e0
[ 372.326821] device_resume_noirq from dpm_resume_noirq+0x134/0x160
[ 372.326872] dpm_resume_noirq from suspend_devices_and_enter+0x2ec/0x5ac
[ 372.326936] suspend_devices_and_enter from pm_suspend+0x254/0x2e4
[ 372.326990] pm_suspend from state_store+0xa8/0xd4
[ 372.327040] state_store from kernfs_fop_write_iter+0x154/0x1a0
[ 372.327105] kernfs_fop_write_iter from vfs_write+0x12c/0x184
[ 372.327166] vfs_write from ksys_write+0x78/0xc0
[ 372.327214] ksys_write from ret_fast_syscall+0x0/0x54
[ 372.327259] Exception stack(0xccb01fa8 to 0xccb01ff0)
[ 372.327297] 1fa0: 00000001 b6fec500 00000001 00a858e8 00000007 00000000
[ 372.327331] 1fc0: 00000001 b6fec500 b6f4ed00 00000004 b6f4cc18 b6f4c780 00000000 00000000
[ 372.327358] 1fe0: 0000000a bea32938 b6ec8530 b6ec854c
[ 372.327379] ---[ end trace 0000000000000000 ]---
[ 372.327412] rpi_firmware_set_power: Failed to set USB to 1 (-110)
[ 372.327983] PM: noirq resume of devices complete after 1058.077 msecs
@ffainelli Any ideas/suggestions?
@pelwell @popcornmix Do you have any idea why regarding to the second trace the "set power state" for USB runs into a timeout ( at 372.325785 ff)? Is there a way to get more debug?
My guess would be that the USB driver is not expecting to be closed then reopened.
What is the aim of this investigation?
My question relates to the unexpected behavior of "set power state": Expected behavior: the VideoCore firmware enables USB power domain as expected Observed behavior: the VideoCore doesn't seem to reply in this case
Yes, the DWC2 USB driver doesn't like it that the power state is not abled after resume. But these are just a consequential errors.
The final goal of this investigation is support for suspend to idle (not suspend to RAM support), which is consider the "simplest" power management mode. I was able to "workaround" this specific issue by disabling DWC2 via device tree. In the meantime i extend the raspberrypi-power with a simple debugfs interface in order to enable and disable specific power domains incl. USB but i wasn't able to reproduce this timeout. So i'm not sure what's causing the USB power domain timeout. Maybe some timing issue.
Yes, the DWC2 USB driver doesn't like it that the power state is not abled after resume.
I was referring to the firmware's USB driver.
Okay, but what should the difference between between manually disabling and re-enabling the power domain via debugfs which works fine and let suspend to idle doing this job.
Have you followed the same sequence using debugfs that the suspend/resume cycle attempts?
I was referring to the firmware's USB driver.
Fascinating. I wasn't aware that the VideoCore also had a USB driver and so both share the same resources. What is the purpose of this firmware USB driver after the ARM core has booted?
Btw there is a difference between my debugfs manual power domain scenario and suspend to idle. In the late suspend to idle first the USB IRQ (including all the other IRQs defined in the device tree) will be disabled in order to avoid wake ups and then the USB power domain will be disabled. Could this be the issue?
What is the purpose of this firmware USB driver after the ARM core has booted?
There isn't one - it steps back, letting the ARMs take over.
Could this be the issue?
Perhaps - if the firmware is waiting for an interrupt to indicate that some operation has completed.
Btw there is a difference between my debugfs manual power domain scenario and suspend to idle. In the late suspend to idle first the USB IRQ (including all the other IRQs defined in the device tree) will be disabled in order to avoid wake ups
Is there times where you may want USB wake up such as from an input device?
Is there times where you may want USB wake up such as from an input device?
Yes, this is a special case which is hopefully handled properly by DWC2. But this is a future step. Currently the board freezes completely in suspend to idle even with debug UART as wake source. At first raspberrypi-power and the mailbox driver needs to be fixed.
This issue was caused by the dwc2 driver and not by the VC4 firmware. The following draft patches should avoid the problem:
https://github.com/lategoodbye/linux-dev/commit/d12863a252ec35e8105c56a80864bfac463a5ea7
https://github.com/lategoodbye/linux-dev/commit/218a671ea144a0257160dc6b4b9be81f79ff963e