Kernel 6.15+ may have network connectivity issues on the Pi4/CM4
Describe the bug
FYI - since kernel 6.15 there may be a lot of network connectivity issues (transmit queue lockup) on the Pi4/CM4 triggered by having traffic coming from both the kernel and a user space application at the same time - like if the pi is setup as a router and file server.
I don't have time to look into it myself, but I have notified the maintainers of the bcmgenet driver and as far as I know it has not yet been fixed - https://lists.openwall.net/netdev/2025/06/27/261
Steps to reproduce the behaviour
Configure Pi as a router (or something where the kernel sends packets) Setup a file server or iperf3 (some user space application that can send packets) Transmit kernel and user space traffic out of the BCM interface at gigabit speed
Device (s)
Raspberry Pi CM4, Raspberry Pi CM4 Lite, Raspberry Pi 4 Mod. B
System
Raspberry Pi reference 2024-11-19 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 891df1e21ed2b6099a2e6a13e26c91dea44b34d4, stage2
Mar 19 2025 18:24:21 Copyright (c) 2012 Broadcom version ca6e8171a80ea46924ffaa629250bfb482f3a02c (clean) (release) (start)
Linux router.localnet 6.12.25-v8-ZAK+ #1 SMP PREEMPT Sat Apr 26 13:37:08 BST 2025 aarch64 GNU/Linux
Logs
Jun 26 14:32:06 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 2004 ms Jun 26 14:32:08 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 4 timed out 2004 ms Jun 26 14:32:09 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 3 timed out 2004 ms Jun 26 14:32:10 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 3 timed out 2892 ms Jun 26 14:32:11 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 3 timed out 3884 ms Jun 26 14:32:12 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 0: transmit queue 1 timed out 2208 ms Jun 26 14:32:13 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 1 timed out 3232 ms Jun 26 14:32:14 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 1 timed out 4224 ms Jun 26 14:32:15 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 5216 ms
Additional context
No response
Kernel 6.15+ may have network connectivity issues
Linux router.localnet 6.12.25-v8-ZAK+ https://github.com/raspberrypi/linux/issues/1 SMP PREEMPT Sat Apr 26 13:37:08 BST 2025 aarch64 GNU/Linux
Does the issue only affect 6.15 and later? You seem to have reported you are running 6.12?
Ah sorry for the confusion. Yes the issue is only in 6.15 and later. I've just backported the driver code from 6.16 to 6.12 with some other changes for my system.
Oh and backporting the 6.16 driver code to 6.12 also brings along the transmit queue lockup problem, but it's not an issue for my system since it's not running any user space applications that send network traffic on the BCM interface.
@ZakKemble Since there were a lot of changes since 6.15, are you able to bisect the issue with a mainline (torvalds) kernel?
@ZakKemble Since there were a lot of changes since 6.15, are you able to bisect the issue with a mainline (torvalds) kernel?
As mentioned, I don't have time to do this.
Edit: I'm able to reproduce this issue with mainline kernel 6.15 (arm64/defconfig) using iperf3.
Setup: Notebook (Server) --- 1 Gigabit --- Raspberry Pi 4 B (Client)
Running 10 parallel clients on Raspberry Pi side seems to trigger this.
@ffainelli I didn't have the time to analyze this issue properly, but I want to share my observations here. The following commit list shows how many parallel iperf clients are necessary to trigger at least 1 transmit queue timeout on Raspberry Pi 4 (8 GB RAM, arm64defconfig):
0ff41df1cb26 : parallel clients 2 and above d2b41068056b : parallel clients 2 and above 64fdb808660d : parallel clients 3 and above 38fec10eb60d : parallel clients 4 and above
The fact that v6.14 also shows transmit queue timeouts, let think that it's not a single commit which introduced a regression and in worst case it never worked properly before.
I think that problem has always been there, I have seen it for as long as I had a Raspberry Pi 4 in my home, which is circa 5 years. Queues 0-3 are configured with 32 descriptors available, which is very few, while queue 16 is configured with 128 descriptors. As long as the timeouts are recoverable, I don't necessarily consider that a bug, but an annoyance.
The problem I've been having with 6.15+ is that the transmit timeouts do not recover. Notice the ever-increasing millisecond timer in the kernel output.
Jun 26 14:32:21 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 11200 ms
Jun 26 14:32:22 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 12224 ms
Jun 26 14:32:23 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 13216 ms
Jun 26 14:32:24 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 3: transmit queue 1 timed out 14208 ms
Jun 26 14:32:25 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 1 timed out 15200 ms
OK, we will try to reproduce and fix it.
The problem I've been having with 6.15+ is that the transmit timeouts do not recover. Notice the ever-increasing millisecond timer in the kernel output.
Jun 26 14:32:21 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 11200 ms Jun 26 14:32:22 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 12224 ms Jun 26 14:32:23 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 13216 ms Jun 26 14:32:24 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 3: transmit queue 1 timed out 14208 ms Jun 26 14:32:25 router.localnet kernel: bcmgenet fd580000.ethernet lan0: NETDEV WATCHDOG: CPU: 1: transmit queue 1 timed out 15200 ms
Sorry, I didn't want to negate your observations regarding recoverbility and thanks for your feedback. In my eyes there are two issues:
- userspace is able to trigger netdev watchdog (currently this feature is not really helpful in this driver)
- it is possible to get bcmgenet in a non-recoverable state
I have been experiencing this, but can't trigger it manually. Currently, it just seems like the forwarding stops at "random" times. But it could be days or weeks between issues. I tried iperf3 in multiple different ways, but haven't been successful.
Is there some extra tracing on the bcmgenet driver which might be helpful?
It's definitely unrecoverable and reboot is the only option.
Just to add to my above comment. I was testing with 6.12.55, and after running that for over a week, I haven't come across this transmit queue timeout message. However, the last boot, with the same kernel, had the error pop up after an uptime of about 2.5 days. So I'm at a bit of a loss to recreate it.
@aplund The kernel must also send packets at the same time as iperf3. When both the kernel and a user space application are sending packets and saturating the BCM interface the TX queue lockup happens almost immediately.
There is a module called pktgen which can generate and send packets from within the kernel, but I've not used it before. Otherwise, the Pi can be setup as a router or bridge with a second USB-ethernet interface.
Also this only effects 6.15+
@aplund The kernel must also send packets at the same time as iperf3. When both the kernel and a user space application are sending packets and saturating the BCM interface the TX queue lockup happens almost immediately.
OK. So I was using iperf3 only on the bcm interface and the packets weren't being forwarded. Is this what you mean?
There is a module called
pktgenwhich can generate and send packets from within the kernel, but I've not used it before. Otherwise, the Pi can be setup as a router or bridge with a second USB-ethernet interface.
This is how this Pi4 is being used. I have 'end0' for a LAN and 'enp1s0u2u3' via USB-ethernet interface for WAN.
Also this only effects 6.15+
Has there been a backport of something to the 6.12 branch?
OK. So I was using iperf3 only on the bcm interface and the packets weren't being forwarded. Is this what you mean?
Yea, when the TX lockup occurs nothing is sent out of the BCM interface. That includes traffic from iperf3 and forwarded traffic.
This is how this Pi4 is being used. I have 'end0' for a LAN and 'enp1s0u2u3' via USB-ethernet interface for WAN.
Sounds right
Has there been a backport of something to the 6.12 branch?
No. (apart from what I did for myself)
From what I can tell, the rpi kernel maintainers only release LTS kernel versions, and since the upstream kernel does an LTS release around this time of year I thought I should create this issue to warn you guys of the problem.
Try the bcm2711_build kernel from here https://github.com/raspberrypi/linux/actions/runs/19230176416
I'm seeing the issue described here. I haven't tried the kernel linked to above yet.
Details:
- rpi 4 w/ 2GiB memory
-
Linux mypi 6.12.25+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Lots of the following error messages:
-
Dec 08 09:11:49 mypi kernel: bcmgenet fd580000.ethernet end0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 2004 ms
I eventually have to reboot from the console. I have a hacky script that reboots when it sees this error in journalctl.