linux icon indicating copy to clipboard operation
linux copied to clipboard

brcmfmac: high cpu utilization

Open KaiJan57 opened this issue 3 years ago • 13 comments

With bcm4330, processes like kworker/u8:2-brcmf_wq/mmc1:0001:1 keep cpu utilization high – slowing down its normal operation and killing the battery. I found a very much quick and dirty workaround that I don't quite understand myself: I make nvram.txt inaccessible to the driver. Right after boot a script would rmmod the brcmfmac module. Then, after moving the nvram.txt back in place, the driver is loaded, i.e insmodded again, making wifi work without cpu utilization issues. Any ideas what is going on?

KaiJan57 avatar May 04 '21 14:05 KaiJan57

I never witnessed such problem, do you see a very high interrupt activity in /proc/interrupts when CPU utilization is high?

digetx avatar May 05 '21 10:05 digetx

I wonder if using an initramfs (having brcmfmac module, but lacking nvram.txt ) ? That would explain why brcmfmac could be mis-initialized.

At least in the process above, I wonder if rmmod modprobe is enough to recover the issue (as I don't see why moving nvram.txt on rmmod would change things ?)...

kwizart avatar May 05 '21 11:05 kwizart

I forgot to mention that wifi does not work at all without making nvram.txt inaccessible to the driver. @digetx I just disabled my workaround and it turns out that interrupt activity is high indeed. CPU hogging process: kworker/u8:4+brcmf_wq/mmc2:0001:1; related interrupts: 89: 33755 0 0 0 LIC 14 Level mmc1 90: 6651429 0 0 0 LIC 19 Level mmc2 91: 963 0 0 0 LIC 31 Level mmc0 117: 6640384 0 0 0 GPIO 179 Level brcmf_oob_intr So, mmc2 matches the highest interrupt count (no. 90 and 117 by far have the highest interrupt counts and they are growing steadily over time). I suspect this to be an issue with the firmware flooding the cpu with irqs, but if that is the case, I have no idea how to fix this properly… @kwizart Just reloading the module does not work. I am using initramfs-linux.img generated by the mkinitcpio command in Arch Linux, and as the nvram.txt in my rootfs is respected by default (driver behaviour would change when renaming nvram.txt even without manually reloading the driver but rebooting), I don't believe initramfs is misinitializing the driver, but I might be wrong; If there is an easy way to gain certainty, tell me and I will see.

KaiJan57 avatar May 06 '21 10:05 KaiJan57

Related dmesg entries:

Bluetooth: hci0: BCM: chip id 62
Bluetooth: hci0: BCM: features 0x0f
Bluetooth: hci0: BCM4330B1
Bluetooth: hci0: BCM4330B1 (002.001.003) build 0000
Bluetooth: hci0: BCM4330B1 'brcm/BCM4330B1.hcd' Patch
Bluetooth: hci0: CyberTan NC223 BCM4330B1 37.4 MHz Class 1.5 WLBGA
Bluetooth: hci0: BCM4330B1 (002.001.003) build 0000
Bluetooth: hci0: BCM: Using default device address (43:30:b1:00:00:00)

brcm/BCM4330B1.hcd is the firmware actually in use as other dmesg entries suggest (no link shows up when all files are in place, but if the driver is initialized 'partially' by disabling nvram.txt, wifi works). Loading with accessible nvram.txt:

brcmfmac: F1 signature read @0x18000000=0x16044330
brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4330-sdio for chip BCM4330/4

(and that's it, no link shows up) Loading with inaccessible nvram.txt, i.e. in my case brcm/brcmfmac4330-sdio.lenovo,cl2n.txt yields:

brcmfmac: F1 signature read @0x18000000=0x16044330
brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4330-sdio for chip BCM4330/4
usbcore: registered new interface driver brcmfmac
brcmfmac mmc2:0001:1: Direct firmware load for brcm/brcmfmac4330-sdio.lenovo,cl2n.txt failed with error -2
brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4330-sdio for chip BCM4330/4
brcmfmac: brcmf_c_process_clm_blob: no clm_blob available (err=-2), device may have limited channels available
brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4330/4 wl0: Oct 25 2011 19:34:12 version 5.90.125.104
ieee80211 phy0: brcmf_p2p_create_p2pdev: timeout occurred
ieee80211 phy0: brcmf_cfg80211_add_iface: add iface p2p-dev-wlan0 type 10 failed: err=-5
IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

KaiJan57 avatar May 06 '21 11:05 KaiJan57

The BCM4330B1.hcd is the Bluetooth firmware, it should be irrelevant to the WiFi.

The WiFi driver should fall back to loading brcm/brcmfmac4330-sdio.txt if brcm/brcmfmac4330-sdio.lenovo,cl2n.txt fails to load, do you have that file?

The brcmf_oob_intr is the host-wake interrupt specified in the ideatab device-tree. The interrupt should be optional and you could remove it from the device-tree, have you tried to remove it?

-			interrupt-parent = <&gpio>;
-			interrupts = <TEGRA_GPIO(W, 3) IRQ_TYPE_LEVEL_HIGH>;
-			interrupt-names = "host-wake";

What happens if you do the rmmod and then reload the driver without making nvram.txt inaccessible? I.e. allow driver to load properly, get the high CPU utilization and then reload the driver.

digetx avatar May 06 '21 20:05 digetx

Well, without BCM4330B1.hcd WiFi does not work at all on my device, so I assume in my case, it is a firmware file both for bluetooth and for WiFi. All other firmware files shipped with the linux-firmware package fail to load (i.e. those .bin files for BCM4330). I do have that default brcm/brcmfmac4330-sdio.txt file, but without temporarily loading brcm/brcmfmac4330-sdio.lenovo,cl2n.txt (which I found somewhere in downstream Android source) that irq flooding will occur.

I think you are on the right track: I commented the lines you proposed out and the cpu load has decreased drastically. Thank you very much for that hint! But I wonder what side effects this change actually has, can you maybe clear up a bit what these interrupt specifications were made for originally?

Yeah, reloading without changing anything does not solve that problem, the cpu-intensive process just shows up as before.

KaiJan57 avatar May 07 '21 09:05 KaiJan57

Could you please try to take the .bin file from the original Android ROM and replace the .bin file of the linux-firmware?

The Bluetooth part of BCM chip shouldn't influence the WiFi. It should be a sign that something isn't correct with the WiFi part, the WiFi firmware binary is the main culprit.

Could you please clarify what do you mean by WiFi does not work at all? The stock linux-firmware WiFi binary doesn't work well on Nexus 7, the driver loads fine and WiFi sees networks, but can't connect, IIRC.

One of functions of the Out-Of-Band interrupt is to trigger wake up event which should resume system from suspend on network activity, but it should be disabled by default and it's not fully supported by the upstream WiFi driver. I don't know what other functions that interrupt has, could be worthwhile to try to ask on the Cypress mailing list about it. On Acer A500 I see that there is some OOB interrupt activity in /proc/interrupts, but it's at a sane level.

digetx avatar May 08 '21 12:05 digetx

First of all, thank you very much for the background information on the driver's interrupt specifications!

I certainly would give your proposal a try, but I don't know which original firmware file to use, as there are three different ones to choose from. The bin files can be found here I wonder how wifi can work even if dmeg says that firmware loading failed? Is the error message wrong? Without that bluetooth related firmware file, no wifi interface shows up, that's what I meant by 'does not work at all'…

KaiJan57 avatar May 11 '21 17:05 KaiJan57

You should be able to download zip file with the original Android ROM from XDA forums or somewhere else, then you could extract the firmware files from it.

The cl2n firmware isn't used by older chips, please ignore that message.

Please try to re-check the Bluetooth gpios, maybe one of them is shared with the WiFi and device-tree isn't correct. The WiFi MMC card should be detected without Bluetooth.

digetx avatar May 12 '21 12:05 digetx

Actually all of the files found with the link are the ones included into the ROM, I know for sure because I have set up the android build system. Maybe I can give each of them a try whenever time is in great abundance to me.

The cl2n firmware isn't used by older chips, please ignore that message.

So firmware is actually loaded successfully despite the error message?

Please try to re-check the Bluetooth gpios

I just double checked the gpios, and they are on par with the downstream kernel (well, as far as I could tell). When my "not working at all"-condition is reached, the MMC card is in fact detected, it's just the wifi interface (wlan0) that does not show up, that's why I am a bit puzzled about this problem…

KaiJan57 avatar May 12 '21 17:05 KaiJan57

So firmware is actually loaded successfully despite the error message?

Correct

I just double checked the gpios, and they are on par with the downstream kernel (well, as far as I could tell). When my "not working at all"-condition is reached, the MMC card is in fact detected, it's just the wifi interface (wlan0) that does not show up, that's why I am a bit puzzled about this problem…

I would try to disable the Bluetooth part of the downstream kernel and check whether it has a working WiFi. This will tell us whether it should work without the Bluetooth in upstream.

digetx avatar May 13 '21 16:05 digetx

I remove cap-sdio-irq in dts and fix cpu usage from 20% to 0.5%

mengxp avatar Jan 02 '23 08:01 mengxp

The BCM4330B1.hcd is the Bluetooth firmware, it should be irrelevant to the WiFi.

The WiFi driver should fall back to loading brcm/brcmfmac4330-sdio.txt if brcm/brcmfmac4330-sdio.lenovo,cl2n.txt fails to load, do you have that file?

The brcmf_oob_intr is the host-wake interrupt specified in the ideatab device-tree. The interrupt should be optional and you could remove it from the device-tree, have you tried to remove it?

-			interrupt-parent = <&gpio>;
-			interrupts = <TEGRA_GPIO(W, 3) IRQ_TYPE_LEVEL_HIGH>;
-			interrupt-names = "host-wake";

What happens if you do the rmmod and then reload the driver without making nvram.txt inaccessible? I.e. allow driver to load properly, get the high CPU utilization and then reload the driver.

I followed the suggestion to check, and found that we didn't connect the WL-HOST-WAKE pin to CPU in my test board, After removing these 3 lines in dts, the issue is gone. Thanks a lot.

georgehuang2 avatar Mar 09 '23 12:03 georgehuang2