linux icon indicating copy to clipboard operation
linux copied to clipboard

Zero 2 W wifi bugs with mmc1: Controller never released inhibit bit(s)

Open llllllllqq opened this issue 2 years ago • 38 comments

Describe the bug

Hi, I find at least 2 random bugs, both of them showing mmc1: Controller never released inhibit bit(s) in dmesg first and following some error.

The first is wifi lost. I cannot access the device via wifi, so I have to force plug-out&in power to reboot. The second is also about wifi, but only error in dmesg. The connection is (maybe) fine.

Steps to reproduce the behaviour

Both bugs are on the same device and the same kernel/firmware shown in the dmesg. I'm sure my power supply is enough and stable (It can output at least 5.2v/2a and pass a stress test with USB volt meter showing 5.1v, no under-volt found, throttled=0x0). And the Pi can pass an 1 hour iperf3 bi-direct stress test without any error.

The first bug (not 100% occur, only random):

  1. Only connect a SATA SSD (toshiba tr200) via a JMS chip USB to SATA box, and via a USB otg line. (I've tested its max current is 501ma by a 10kHz USB meter, average working current is ~300ma).
  2. Use an android app called "foldersync" to upload some big files via sftp to that SSD on Pi.
  3. Manually cancel the transfer.
  4. Wifi is disappered. And in at least 30 minutes, it won't self-restart, so I force restart.

The second bug (also random, but more often, avg once 2 days):

  1. Connect nothing, use the same android app "foldersync" to upload some big files via sftp to SDCARD on Pi.
  2. Manually cancel the transfer.
  3. Wifi seems normal, but dmesg shows sth error.

Device (s)

Raspberry Pi Zero 2 W

System

Raspberry Pi reference 2023-10-10 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 962bf483c8f326405794827cce8c0313fd5880a8, stage2

Oct 17 2023 15:42:53 Copyright (c) 2012 Broadcom version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start_cd)

Linux rpi02w 6.1.0-rpi6-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.58-1+rpt2 (2023-10-27) aarch64 GNU/Linux

Logs

First bug:

Dec 02 13:48:12 rpi02w kernel: mmc1: Controller never released inhibit bit(s).
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -5
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_rxglom: descriptor len 5 bad: 0
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_hdparse: HW header length too long
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: terminate frame
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_rxglom: descriptor len 7 bad: 0
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_hdparse: HW header length too long
Dec 02 13:48:12 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: terminate frame
Dec 02 13:48:18 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:18 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
Dec 02 13:48:25 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:25 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
Dec 02 13:48:30 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:30 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
Dec 02 13:48:37 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:37 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
Dec 02 13:48:43 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:43 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
Dec 02 13:48:48 rpi02w kernel: ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Dec 02 13:48:48 rpi02w kernel: ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
....endless repeat....

Note: this bug seems to be like https://github.com/raspberrypi/firmware/issues/1723 but I don't know whether it's safe to try over_voltage=2 on my device (my vcgencmd measure_volts is volt=1.2563V)

Second bug: type a:

Dec 04 20:22:35 rpi02w kernel: mmc1: Controller never released inhibit bit(s).
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: read 1108 bytes from channel 2 failed: -5
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: brcmf_sdio_readframes: glom superframe w/o descriptor!
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: terminate frame

type b:

Dec 02 17:56:17 rpi02w kernel: mmc1: Controller never released inhibit bit(s).
Dec 02 17:56:17 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: read 1372 bytes from channel 2 failed: -5
Dec 02 17:56:17 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame

I've also uploaded a full dmesg of bug 2, type a. dmesg.txt

Additional context

Note: I notice all logs are start with a mmc1: Controller never released inhibit bit(s).

No response

llllllllqq avatar Dec 05 '23 04:12 llllllllqq

This sounds like a hardware/power issue. You can try adding over_voltage=2 to config.txt.

pelwell avatar Dec 05 '23 08:12 pelwell

This sounds like a hardware/power issue. You can try adding over_voltage=2 to config.txt.

Thanks for your reply. I've added this config and will keep my eyes on it. Some additional questions: Did you mean every mmc1: Controller never released inhibit bit(s). is about power/hw issue? If yes, it'll save a lot time to find whether bugs are triggered because the first bug is very rare. And after over_voltage=2, is volt=1.3062V safe for my device with only a aluminum heatsink (no fan)?

llllllllqq avatar Dec 05 '23 08:12 llllllllqq

I have never seen your error (mmc1: Controller never released inhibit bit(s).) before. I am basing my diagnosis on the basis that an event in a different subsystem (your large transfer over USB) causes the problem, but it's possible it's the WiFi side of the transfer that caused the problem.

Your examples have all been of large transfers that get cancelled. Have you tried changing the scenario:

  1. Smaller transfers.
  2. Allowing them to complete.

Is the cancelling because things are stalled? In other words, is it possible that the error has already occurred and that the need to cancel is a symptom?

pelwell avatar Dec 05 '23 09:12 pelwell

I have never seen your error (mmc1: Controller never released inhibit bit(s).) before. I am basing my diagnosis on the basis that an event in a different subsystem (your large transfer over USB) causes the problem, but it's possible it's the WiFi side of the transfer that caused the problem.

Thanks for your analyses. But the second bug I mentioned is upload files to SDCARD, not to USB disk (USB is not connected; only power line is connected)

Your examples have all been of large transfers that get cancelled. Have you tried changing the scenario:

  1. Smaller transfers.
  2. Allowing them to complete.

Is the cancelling because things are stalled? In other words, is it possible that the error has already occurred and that the need to cancel is a symptom?

I upload big files only for easier testing, no errors force me to do. I cancel transfer only for saving time (Cancel the transfer and see dmesg). Because several seconds are enough for triggering these bugs if "lucky". Indeed, I've tried and successfully completely uploaded several "big" (~200MB to 5GB) and ~1000 "small" (~3MB) photos to USB/SDCARD. I think upload anything would randomly trigger these bugs, and the "foldersync" app is only a sftp (single thread) GUI, so I only chose big files for continuous stress the wifi chip. I'll try over_voltage=2 now, and your two suggestions next.

llllllllqq avatar Dec 05 '23 09:12 llllllllqq

I find more information. These bugs even occur synchronously when I start a sftp session by "foldersync".

Dec 04 20:17:01 rpi02w CRON[13185]: pam_unix(cron:session): session closed for user root
Dec 04 20:22:35 rpi02w kernel: mmc1: Controller never released inhibit bit(s).
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: read 1108 bytes from channel 2 failed: -5
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_readframes: brcmf_sdio_readframes: glom superframe w/o descriptor!
Dec 04 20:22:35 rpi02w kernel: brcmfmac: brcmf_sdio_rxfail: terminate frame
Dec 04 20:22:36 rpi02w sshd[13198]: Accepted publickey for S**** from 192.168.1.1 port 47088 ssh2: ED25519 SHA256:*****
Dec 04 20:22:36 rpi02w sshd[13198]: pam_unix(sshd:session): session opened for user S****(uid=1000) by (uid=0)
Dec 04 20:22:36 rpi02w systemd-logind[440]: New session 94 of user S****.
Dec 04 20:22:36 rpi02w systemd[1]: Started session-94.scope - Session 94 of User S****.
Dec 04 20:22:36 rpi02w sshd[13198]: pam_env(sshd:session): deprecated reading of user environment enabled
Dec 04 20:22:51 rpi02w sshd[13206]: Received disconnect from 192.168.1.1 port 47088:11:

llllllllqq avatar Dec 05 '23 09:12 llllllllqq

Can you add "brcmfmac.debug=0x100000" to /boot/firmware/cmdline.txt, reboot and continue your testing? The WiFi firmware should put some diagnostic output into the kernel log, which you can read using "dmesg | grep brcmfmac".

pelwell avatar Dec 05 '23 09:12 pelwell

Can you add "brcmfmac.debug=0x100000" to /boot/firmware/cmdline.txt, reboot and continue your testing? The WiFi firmware should put some diagnostic output into the kernel log, which you can read using "dmesg | grep brcmfmac".

Ok, thank you.

llllllllqq avatar Dec 05 '23 09:12 llllllllqq

With brcmfmac.debug=0x100000 and over_voltage=2, bugs occurred again.

What I was running: stress -c 2 -i 1 -d 1 (stress the device and the USB to see if undervolt, but not: throttled=0x0) curl download a big file (debian x86_64 iso) to /dev/null (to stress the wifi chip & prolong the sftp uploading for longer testing) use "foldersync" upload 2 big files (202MB & 710MB) to USB disk, without manually cancelling

When uploaded these 2 files, dmesg showed (wifi was not lost, only showed errors):

[    0.000000] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_hdmi=0  smsc95xx.macaddr=B8:27:EB:F0:29:7C vc_mem.mem_base=0x1fa00000 vc_mem.mem_size=0x20000000  console=tty1 console=ttyAMA0,115200 root=PARTUUID=92734e84-02 rootfstype=ext4 fsck.repair=yes rootwait cfg80211.ieee80211_regdom=CN brcmfmac.debug=0x100000
[   15.435372] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[   15.473975] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43436s-sdio for chip BCM43430/1
[   15.480396] usbcore: registered new interface driver brcmfmac
[   15.513607] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43436s-sdio.clm_blob failed with error -2
[   15.766915] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
[   15.767933] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Jun 14 2023 07:27:45 version 7.45.96.s1 (gf031a129) FWID 01-70bd2af7 es7
[   15.917957] brcmfmac: CONSOLE: hndarm_armr addr: 0x18003000, cr4_idx: 0
[   15.917996] brcmfmac: CONSOLE: 000000.001
[   15.918015] brcmfmac: CONSOLE: RTE (SDIO-CDC) 7.45.96.s1 on BCM43430 r1 @ 37.4/81.6/81.6MHz
[   15.918034] brcmfmac: CONSOLE: 000000.001 sdpcmdcdc0: Broadcom SDPCMD CDC driver
[   15.918053] brcmfmac: CONSOLE: 000000.007 reclaim section 0: Returned 45474 bytes to the heap
[   15.918072] brcmfmac: CONSOLE: 000000.009 wlc_bmac_info_init: host_enab 1
[   15.918091] brcmfmac: CONSOLE: 000000.061 wl0: Broadcom BCM43430 802.11 Wireless Controller 7.45.96.s1 (gf031a129)
[   15.918111] brcmfmac: CONSOLE: 000000.062 wl_nd_ra_filter_init: Enter..
[   15.918129] brcmfmac: CONSOLE: 000000.062 TCAM: 256 used: 215 exceed:0
[   15.918147] brcmfmac: CONSOLE: 000000.063 reclaim section 1: Returned 79816 bytes to the heap
[   15.918167] brcmfmac: CONSOLE: 000000.099 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   15.918187] brcmfmac: CONSOLE: 000000.150 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   20.386503] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[   20.414510] brcmfmac: CONSOLE: 000004.693 wl0: wl_open
[   20.789678] brcmfmac: CONSOLE: 000005.085 wl0: wl_open
[   20.917682] brcmfmac: CONSOLE: 000005.219 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   21.497411] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save disabled
.....following logs contain not only grep brcmfmac but also grep mmc1.....
[ 2133.914638] mmc1: Controller never released inhibit bit(s).
[ 2133.914786] brcmfmac: mmc_submit_one: CMD53 sg block read failed -5
[ 2133.914820] brcmfmac: brcmf_sdio_rxglom: glom read of 7680 bytes failed: -5
[ 2133.914849] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[ 2287.343373] mmc1: Controller never released inhibit bit(s).
[ 2287.343528] brcmfmac: mmc_submit_one: CMD53 sg block read failed -5
[ 2287.343565] brcmfmac: brcmf_sdio_rxglom: glom read of 8704 bytes failed: -5
[ 2287.343594] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[ 2287.363912] mmc1: Controller never released inhibit bit(s).
[ 2287.364058] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -5
[ 2287.364082] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[ 2287.366145] brcmfmac: brcmf_sdio_rxglom: descriptor len 6 bad: 0
[ 2287.366260] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[ 2287.366274] brcmfmac: brcmf_sdio_rxfail: terminate frame
[ 2287.367196] brcmfmac: brcmf_sdio_rxglom: descriptor len 9 bad: 0
[ 2287.367314] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[ 2287.367328] brcmfmac: brcmf_sdio_rxfail: terminate frame

Note: now it is volt=1.3062V and throttled=0x0, temp ~=71.0'C Now I'll test whether a sole curl downloading will trigger bugs to exclude the "foldersync" app's factor

llllllllqq avatar Dec 05 '23 12:12 llllllllqq

With brcmfmac.debug=0x100000 and no over_voltage, not using "foldersync" app, but using a standard sftp cli in "OpenSSH_9.5p1, OpenSSL 3.0.12 24 Oct 2023" bugs occurred again *2.

What I was running: stress -c 1 -i 1 -d 1 (note: I changed to -c 1 to make CPU temp below 80'C) curl download a big file (debian x86_64 iso) to /dev/null use sftp cli to upload 1 big files (4000MB data generated by /dev/urandom) to USB disk, without manually cancelling

Then dmesg showed (wifi was not lost, only showed errors):

[ 1417.607957] mmc1: Controller never released inhibit bit(s).
[ 1417.608109] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -5
[ 1417.608132] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[ 1417.610137] brcmfmac: brcmf_sdio_rxglom: descriptor len 5 bad: 0
[ 1417.610237] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[ 1417.610251] brcmfmac: brcmf_sdio_rxfail: terminate frame
[ 1417.612039] brcmfmac: brcmf_sdio_rxglom: descriptor len 6 bad: 0
[ 1417.612218] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[ 1417.612233] brcmfmac: brcmf_sdio_rxfail: terminate frame

However, only use curl to download sth to USB won't trigger bugs. Maybe TCP packages different (I know little about them)? The device is bought last month with the latest RPi os installed and upgraded.

update: Below is with a USB flash disk (Sandisk Ultra CZ48 32GB), not a USB SATA. Considering I've also suffered once when no USB devices connected at all, it is not likely to be a bug with USB OTG.

[ 4975.103448] mmc1: Controller never released inhibit bit(s).
[ 4975.113244] brcmfmac: brcmf_sdio_isr: failed backplane access
[ 4975.114038] brcmfmac: brcmf_sdio_hdparse: HW superframe header length error
[ 4975.114080] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame

llllllllqq avatar Dec 05 '23 14:12 llllllllqq

Tested with arm_freq=800 gpu_freq=300 sdram_freq=400 but bugs still exist. All other procedures are same as the comment above.

[36957.150360] mmc1: Controller never released inhibit bit(s).
[36957.150673] brcmfmac: mmc_submit_one: CMD53 sg block read failed -5
[36957.150736] brcmfmac: brcmf_sdio_rxglom: glom read of 4608 bytes failed: -5
[36957.150771] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[36957.151270] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[36957.151291] brcmfmac: brcmf_sdio_rxfail: terminate frame

llllllllqq avatar Dec 06 '23 00:12 llllllllqq

It seems to be an arm64 only bug. I backuped the sdcard and fresh install an armv7 system, and install nearly all same software as in previous arm64 system; after 4 hours testing, no bugs happened.
New system info:

Raspberry Pi reference 2023-10-10
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, fb56ad562991cf3ae5c96ab50983e1deeaefc7b6, stage2

Oct 17 2023 15:42:53
Copyright (c) 2012 Broadcom
version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start_cd)

Linux rpi02w 6.1.0-rpi6-rpi-v7 #1 SMP Raspbian 1:6.1.58-1+rpt2 (2023-10-27) armv7l GNU/Linux

I'll test more with it.

llllllllqq avatar Dec 06 '23 10:12 llllllllqq

That's a useful datapoint. We don't recommend 64-bit on Zero 2 W because of the memory constraints, but that doesn't mean the WiFi shouldn't work.

pelwell avatar Dec 06 '23 10:12 pelwell

It seems to be an arm64 only bug. I backuped the sdcard and fresh install an armv7 system, and install nearly all same software as in previous arm64 system; after 4 hours testing, no bugs happened. New system info:

Raspberry Pi reference 2023-10-10
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, fb56ad562991cf3ae5c96ab50983e1deeaefc7b6, stage2

Oct 17 2023 15:42:53
Copyright (c) 2012 Broadcom
version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start_cd)

Linux rpi02w 6.1.0-rpi6-rpi-v7 #1 SMP Raspbian 1:6.1.58-1+rpt2 (2023-10-27) armv7l GNU/Linux

I'll test more with it.

This system works fine during the past 5 days, and the newst 6.1.63 kernel also works fine.

Today I switched to a arm64 kernel (Linux rpi02w 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux) by adding arm_64bit=1 config (but with armhf packages), ~~and it seems still working fine after a 2 hours test.~~

Oh no! it went wrong now, after 3 hours (with some pauses) test (avg temp 70ºC, no under volt found) dmesg:

mmc1: Controller never released inhibit bit(s).                                      brcmfmac: mmc_submit_one: CMD53 sg block read failed -5                       
brcmfmac: brcmf_sdio_rxglom: glom read of 9216 bytes failed: -5                     
brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame                         
brcmfmac: brcmf_sdio_hdparse: HW header length too long                            
brcmfmac: brcmf_sdio_rxfail: terminate frame

~~I noticed my wifi router perform an auto-select channel, and the only channel number I remember during the bugs is 4 (+8 due to HT40). I will test more with more channels/HT20/arm64 kernel over armhf packages/armhf on armhf.~~ Edit 3: HT20/HT40 or channels won't affect.

small question: I wonder whether arm64 kernel loads the same drivers/firmwares as armhf kernel, and whether arm64 kernel consumes more power which may cause the bugs. note: on armhf kernel, I've tested my power supply can server the pi with a 5v1A 2.5' USB HDD well, so maybe it's unlikely to be a power issue?

~~Edit1: This time, with arm64 kernel on armhf packages, after a reboot, I immediately run iperf3 and stress -c 1 -m 1 -i 1 -d 1# (on a USB SSD) and sftp (failed to run due to wifi already died) in screen, the wifi was down. I cannot access it anymore via wifi. The SSD link LED was blinking so the device was still running at least the stress. I then rebooted my wifi router, but the pi didn't connect it again. After a force reboot, the logs showed very long and maybe detailed errors. It looks different than before. Some sensitive data are masked by ###, if you need it I can email you. log.txt Edit2: Strange...This new bug will also occur on armhf kernel. I'll check my board and sd card.~~ A new 100% occur bug, I'll post a new issue.

llllllllqq avatar Dec 11 '23 05:12 llllllllqq

More strange test results:

When I use sudo dd if=/dev/zero of=~/test bs=4M to stress my SD card (mounted as system root & home), the status LED of my USB SATA TO SSD adapter is OFF after ~5 seconds. If I access my SSD like ls /mnt/myssd, the LED often will turn on and ls work, but sometimes it will be stuck for a long time till working, and dmesg showing:

[  218.709981] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[  219.374781] ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST failed, err=-110
[  280.949737] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[  291.509856] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[  295.524233] ieee80211 phy0: brcmf_fill_bss_param: Failed to get bss info (-110)
[  315.030205] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[  316.732193] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110

It looks like the bugs mentioned in this issue, so I posted here.

llllllllqq avatar Dec 12 '23 14:12 llllllllqq

How is your SATA drive and adaptor powered? The Zero 2 has a very limited capacity to power external hardware.

pelwell avatar Dec 12 '23 14:12 pelwell

How is your SATA drive and adaptor powered? The Zero 2 has a very limited capacity to power external hardware.

Directly from the pi. But I've tested with a USB meter, this adapter + SSD will only comsume 5V 300ma on average and 5V 501ma on peak, and they can pass a very long stress test on pi.

llllllllqq avatar Dec 12 '23 14:12 llllllllqq

they can pass a very long stress test on pi.

And yet you are getting SDIO problems, which doesn't sound like a pass to me. Your SSD may be able to limp along, but another part of the system is suffering.

If you insist on trying to power an external device from the Zero 2 while the rest of the system is busy then this is what you are going to get. Undervoltage (on at least some part of the board) is not something that can be fixed in software.

pelwell avatar Dec 12 '23 15:12 pelwell

they can pass a very long stress test on pi.

And yet you are getting SDIO problems, which doesn't sound like a pass to me. Your SSD may be able to limp along, but another part of the system is suffering.

If you insist on trying to power an external device from the Zero 2 while the rest of the system is busy then this is what you are going to get. Undervoltage (on at least some part of the board) is not something that can be fixed in software.

Thanks for your advice, I'll use a USB flash drive instead to test this issue further.

llllllllqq avatar Dec 12 '23 23:12 llllllllqq

Arm64 kernel on armhf packages with only a USB flash disk (sandisk CZ48, directly powered, max 5V 147ma, avg 5V 80ma), dmesg is like those of many tests before:

[ 2371.595553] mmc1: Controller never released inhibit bit(s).                                      
[ 2371.595669] brcmfmac: mmc_submit_one: CMD53 sg block read failed -5                              
[ 2371.595700] brcmfmac: brcmf_sdio_rxglom: glom read of 2048 bytes failed: -5                      
[ 2371.595726] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame

I noticed that if I let mmc0&mmc1 (or saying SDIO?) busy, this bug seems to occur more frequently.

llllllllqq avatar Dec 13 '23 01:12 llllllllqq

I tested rpi-update next, and Linux rpi02w 6.6.5-v8+ #1706 SMP PREEMPT Fri Dec 8 18:43:36 GMT 2023 aarch64 GNU/Linux still has this bug:

[ 1675.506651] mmc1: Controller never released inhibit bit(s).
[ 1675.507753] brcmfmac: brcmf_sdio_readframes: read 1525 bytes from channel 2 failed: -5
[ 1675.507806] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame

llllllllqq avatar Dec 15 '23 01:12 llllllllqq

iperf.txt iperf-r.txt

I find the pi has a very high tcp package retry. On a latest fresh raspbian 32bit, both 6.1.63 armhf and 6.1.63 arm64 kernel have this bug. I tested by iperf3 -s on pi connected to 2.4ghz wifi of my router, iperf3 -c pi_ip -d on my android phone connected to 5ghz_wifi of my router. As iperf.txt shows, there are 1820 retries in 10s. I also tested iperf3 on the permutation and combination of laptop/phone/pc on 2.4g/5g/rj45, but no error found, so it's not likely my router's fault. In addition, there is a more strange output in iperf.txt: "sender_has_retransmits":1.8446744073709552e+19,

Does this means my pi's wifi module is broken and related to the bugs in this issue?

edit: maybe off topic. Use iperf3 -b to limit speed will greatly alleviates the problem. And a USB eth also have this issue, so pi's wifi does not seem to be broken.

llllllllqq avatar Dec 17 '23 04:12 llllllllqq

Today, I tried the latest ubuntu server arm64, and the bug still exists. The pi was USB boot on Linux version 6.5.0-1008-raspi (buildd@bos02-arm64-062) (aarch64-linux-gnu-gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.41) #11-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 22 19:08:26 UTC 2023 (Ubuntu 6.5.0-1008.11-raspi 6.5.3) . No sdcard was inserted (I just wanted to check whether my sdcard is broken that leads to the bug.) It was a fresh install without non-default software.

I noticed PI 4 had similar bugs. I also noticed some people also encountered similar bugs when use ssh/scp/sftp on BCM wifi chip on sdio, but they were on other CPU (sunxi). More links here, I think they might help a little 1 2 3 4

Logs are here. (Sorry, I forgot added brcmfmac.debug=0x100000 to cmdline, but I think it's same as comments above)

[  718.588249] mmc1: Controller never released inhibit bit(s).
[  718.594160] brcmfmac: brcmf_sdio_readframes: read 1525 bytes from channel 2 failed: -5
[  718.604209] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[  756.336904] mmc1: Controller never released inhibit bit(s).
[  756.342839] brcmfmac: brcmf_sdio_readframes: read 1525 bytes from channel 2 failed: -5
[  756.351533] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[  762.385556] mmc1: Controller never released inhibit bit(s).
[  762.391492] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -5
[  762.398217] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[  762.408111] brcmfmac: brcmf_sdio_rxglom: descriptor len 5 bad: 0
[  762.414791] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[  762.421346] brcmfmac: brcmf_sdio_rxfail: terminate frame
[  762.427640] brcmfmac: brcmf_sdio_hdparse: HW superframe header length error
[  762.434803] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[  762.442047] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[  762.448494] brcmfmac: brcmf_sdio_rxfail: terminate frame
[  792.068261] brcmfmac: brcmf_sdio_isr: failed backplane access
[  792.077789] brcmfmac: brcmf_sdio_isr: failed backplane access
[  792.100223] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
[  792.106760] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[  792.117648] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[  792.125158] brcmfmac: brcmf_sdio_rxfail: terminate frame
[  792.133482] brcmfmac: brcmf_sdio_rxglom: descriptor len 3 bad: 0
[  792.140738] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[  792.147968] brcmfmac: brcmf_sdio_rxfail: terminate frame
[  792.155987] brcmfmac: brcmf_sdio_rxglom: descriptor len 4 bad: 0
[  792.163415] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[  792.170632] brcmfmac: brcmf_sdio_rxfail: terminate frame

after these, the wifi dead. so I must force reboot (I've no uart or hdmi access)

llllllllqq avatar Dec 23 '23 09:12 llllllllqq

I still have the problem what this is an error I have never encountered, to the extent that a hardware issue seems most likely. Without a means to reproduce it there is little we can do.

pelwell avatar Dec 23 '23 11:12 pelwell

I still have the problem what this is an error I have never encountered, to the extent that a hardware issue seems most likely. Without a means to reproduce it there is little we can do.

Thanks for your help and comments. I'll ask the seller for a new device, if they can, I'll test the bug with new device. Thanks again.

llllllllqq avatar Dec 23 '23 11:12 llllllllqq

I got a new zero2w, and used the same sdcard (armhf raspberry os with arm_64bit=1) for testing, and bugs still exist, so it's not likely a hardware issue.

(Edit: note, before the dmesg below, I got several mmc1: Got command interrupt 0x00000001 even though no command operation was in progress. but everything worked well)

[ 1194.501692] mmc1: Controller never released inhibit bit(s).
[ 1194.501830] brcmfmac: mmc_submit_one: CMD53 sg block read failed -5
[ 1194.501863] brcmfmac: brcmf_sdio_rxglom: glom read of 9216 bytes failed: -5
[ 1194.501889] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame
[ 1194.502197] brcmfmac: brcmf_sdio_hdparse: HW header length too long
[ 1194.502212] brcmfmac: brcmf_sdio_rxfail: terminate frame

After about 3 minutes, wifi lost.


The only difference between the two devices I find is default volt. This new device has a higher value 1.2938v, with higher temp when testing.

temp=76.8'C
volt=1.2938V
throttled=0x0

I'll switch to armhf kernel to see whether bugs exist on the new device with armhf kernel.

llllllllqq avatar Jan 08 '24 09:01 llllllllqq

I see the same error (mmc1: Controller never released inhibit bit(s).) on a CM4 (booting from eMMC) based device (RevPi Connect 4) without any particular load / io ops.

[...]
[53007.437237] mmc1: Controller never released inhibit bit(s).
[53007.437307] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[53007.437317] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[83805.688959] mmc1: Controller never released inhibit bit(s).
[83805.688997] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[83805.689007] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[145912.200807] mmc1: Controller never released inhibit bit(s).
[145912.200844] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[145912.200854] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK

The device is running with 6.1.46-rt13-v8, but does also occur with the rpi stock kernel. I'll check another device and will have a look how this can be reproduced easily. @JamesH65 should have a Connect 4 at hand (from last visit)

nbuchwitz avatar Jan 27 '24 19:01 nbuchwitz

I see the same error (mmc1: Controller never released inhibit bit(s).) on a CM4 (booting from eMMC) based device (RevPi Connect 4) without any particular load / io ops.

[...]
[53007.437237] mmc1: Controller never released inhibit bit(s).
[53007.437307] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[53007.437317] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[83805.688959] mmc1: Controller never released inhibit bit(s).
[83805.688997] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[83805.689007] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[145912.200807] mmc1: Controller never released inhibit bit(s).
[145912.200844] brcmfmac: brcmf_sdio_read_control: read 2048 control bytes failed: -5
[145912.200854] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK

The device is running with 6.1.46-rt13-v8, but does also occur with the rpi stock kernel. I'll check another device and will have a look how this can be reproduced easily. @JamesH65 should have a Connect 4 at hand (from last visit)

Hi, to reproduce, you could try ssh to upload a huge file to the pi (via wifi), such as scp local:/dev/urandom to pi:/dev/null. If not lucky, stress wifi by iperf may help.

llllllllqq avatar Jan 28 '24 00:01 llllllllqq

Today I buy a new wifi router, wifi7 xiaomi be3600. It's IPQ5xxx CPU, not my old MTK one, and bugs mentioned above still exist (even after I upgraded to 6.1.73 kernel), so I think it's not a router problem.

edit: I found the new router's log says: 192.168.1.8(rpi) done an DHCP attack. This warning said 2 times when I test for bugs (and bug did occur)

llllllllqq avatar Jan 31 '24 11:01 llllllllqq