Wifi fails under heavy load and requires reboot (mmc1: Timeout waiting for hardware interrupt)
Describe the bug I'm running octoprint on my Raspberry with a webcam and connecting in headless mode over wifi. If I watch the webcam stream from octoprint, which puts the wifi under heavy load, after a while (random, can be minutes, can be hours), the wifi cuts out. Nothing but a reboot can bring it back. Currently I'm using the octopi image, but I had the same issue using the official Raspbian image.
To reproduce
- Connect to a wifi
- Put the wifi under heavy load for an extended period of time
- Sometimes the wifi stops working, does not happen every time
Expected behaviour Wifi keeps working
Actual behaviour Wifi stops working, device is unresponsive. Even with a screen and keyboard connected, I was unable to bring the device down (ifdown didn't find the device anymore), ifconfig still listed it but not connected, trying to scan for wifi networks let to a timeout.
System https://pastebin.com/TxW2rkR8
Logs https://pastebin.com/GNxF3A9h
Additional info I have a second MicroSD card I can use to run tests on, if that would help
I have the same problem on a Raspberry Pi 400, my WiFi suddenly stops working and i need to restart it.
ifdown didn't find the device anymore
This command didn't work for me either but using ifconfig works fine for me:
sudo ifconfig wlan0 down
and then, after some time
sudo ifconfig wlan0 up
I'm experiencing exactly the same thing on the CM4, kernel 6.6.31+rpt-rpi-v8.
After the initial error cm4 kernel: mmc1: Timeout waiting for hardware interrupt the following keeps repeating indefinitely:
Jul 06 21:16:28.363687 cm4 kernel: brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
Jul 06 21:16:29.885265 cm4 kernel: brcmfmac: brcmf_sdio_rxfail: count never zeroed: last 0xffff
Jul 06 21:16:29.885488 cm4 kernel: brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -5
This is almost certainly a silly question, but have you tried to disable wifi power save?
/usr/sbin/iw wlan0 set power_save off
Same with Pi 3 B+. I have power save turned off.
Jul 07 23:29:34 streetcat kernel: brcmfmac: brcmf_cfg80211_set_power_mgmt: power save disabled
Jul 08 20:57:45 streetcat kernel: mmc1: Timeout waiting for hardware interrupt.
Jul 08 20:57:45 streetcat kernel: brcmfmac: mmc_submit_one: CMD53 sg block write failed -110
Jul 08 20:57:45 streetcat kernel: brcmfmac: brcmf_sdio_txfail: sdio error, abort command and terminate frame
Jul 08 20:57:45 streetcat kernel: brcmfmac: brcmf_sdio_hdparse: seq 127: max tx seq number error
Jul 08 20:57:55 streetcat kernel: mmc1: Timeout waiting for hardware interrupt.
Jul 08 20:57:55 streetcat kernel: brcmfmac: mmc_submit_one: CMD53 sg block write failed -110
Jul 08 20:57:55 streetcat kernel: brcmfmac: brcmf_sdio_txfail: sdio error, abort command and terminate frame
Jul 08 20:58:06 streetcat kernel: mmc1: Timeout waiting for hardware interrupt.
Jul 08 20:58:06 streetcat kernel: brcmfmac: mmc_submit_one: CMD53 sg block write failed -110
Jul 08 20:58:06 streetcat kernel: brcmfmac: brcmf_sdio_txfail: sdio error, abort command and terminate frame
Jul 08 20:58:23 streetcat kernel: brcmfmac: brcmf_sdio_hdparse: HW header checksum error
Jul 08 20:58:23 streetcat kernel: brcmfmac: brcmf_sdio_rxfail: terminate frame
<last 2 lines repeats>
Kernel version 6.1.21-v7+, failed in the middle of scp'ing a large file.
This message is key: HW header checksum error It suggests corruption on the SDIO bus, which in turn suggests a lack of power. Does over_voltage=2 in config.txt help?
Thank you for your insight, @pelwell
Using over_voltage=2 seems to fix the issue for me! Wifi speed is even a tiny bit faster and the CM4 gets a little bit hotter.
over_voltage=1 may also work for you - it depends on how marginal the voltage is on your CM4.
I haven't gotten a crash with over_voltage=1 yet - test transfer is still running. However, I can already see that the transfer speed is about a third slower. Is there any downside for running with over_voltage=2?
Voltage should not affect speed in that way, so the slow-down might be due to retries due to CRC errors. It sounds like 2 is the correct over_voltage.
That make total sense! Transfer speed with over_voltage=1 is all over the place for me, it bounces up and down and in the end it's about a third slower. (I don't see CRC errors in dmesg however.) over_voltage=2 is way more stable, I'll stick with that then. Thanks, @pelwell