mt76
mt76 copied to clipboard
mt7603: unstable traffic (stalls/hangs) under load (MT7603EN, MT7628AN)
I experience stability issues using chipsets supported by the mt7603
driver. When running iperf
client on STA I observe hiccups (traffic temporarily slows down and sometimes stops).
I first reported this back in 2021 in e-mail thread Unstable WiFi with mt76 on MT7628AN. It doesn't seem to be regression as this issue seems to go back to 2019 at least. It is also present in the latest mt76
(2024).
For a while there were probably two different issues in mt7603
: PSE hangs and traffic hangs. The first problem was hopefully fixed in 2023 with commits baa19b2e4b7b c677dda16523 317620593349 19e4f271d62e c2fcc83b41a6.
Traffic hangs remain unresolved and were observed by multiple people using various devices. See above e-mail for OpenWrt forum reports and GitHub issues #692 #719 #841.
Netgear R6220 (MT7621ST SoC + MT7603EN Wi-Fi + MT7612EN Wi-Fi)
Example from OpenWrt 23.05.2 (iperf on STA connected to MT7603EN using channel 1 bandwidth 20 MHz):
[ 3] 25.0-26.0 sec 5.50 MBytes 46.1 Mbits/sec
[ 3] 26.0-27.0 sec 4.62 MBytes 38.8 Mbits/sec
[ 3] 27.0-28.0 sec 4.75 MBytes 39.8 Mbits/sec
[ 3] 28.0-29.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 29.0-30.0 sec 7.50 MBytes 62.9 Mbits/sec
[ 3] 30.0-31.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 31.0-32.0 sec 6.75 MBytes 56.6 Mbits/sec
[ 3] 32.0-33.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 33.0-34.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 34.0-35.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 35.0-36.0 sec 7.25 MBytes 60.8 Mbits/sec
[ 3] 36.0-37.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 37.0-38.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 38.0-39.0 sec 4.75 MBytes 39.8 Mbits/sec
[ 3] 39.0-40.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 40.0-41.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 41.0-42.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 42.0-43.0 sec 2.38 MBytes 19.9 Mbits/sec
[ 3] 43.0-44.0 sec 896 KBytes 7.34 Mbits/sec
[ 3] 44.0-45.0 sec 1.00 MBytes 8.39 Mbits/sec
[ 3] 45.0-46.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 46.0-47.0 sec 1.25 MBytes 10.5 Mbits/sec
[ 3] 47.0-48.0 sec 3.00 MBytes 25.2 Mbits/sec
[ 3] 48.0-49.0 sec 1.00 MBytes 8.39 Mbits/sec
[ 3] 49.0-50.0 sec 1.12 MBytes 9.44 Mbits/sec
[ 3] 50.0-51.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 51.0-52.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 52.0-53.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 53.0-54.0 sec 1.25 MBytes 10.5 Mbits/sec
[ 3] 54.0-55.0 sec 2.75 MBytes 23.1 Mbits/sec
[ 3] 55.0-56.0 sec 3.75 MBytes 31.5 Mbits/sec
[ 3] 56.0-57.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 57.0-58.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 58.0-59.0 sec 5.62 MBytes 47.2 Mbits/sec
[ 3] 59.0-60.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 60.0-61.0 sec 7.50 MBytes 62.9 Mbits/sec
[ 3] 61.0-62.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 62.0-63.0 sec 6.38 MBytes 53.5 Mbits/sec
Whenever traffic stops I can see that station's TX bitrate reported by router goes down from 72.2 Mbps to 6.5 Mbps.
Switching from HT20
to NOHT
results in rate being limited to 54 Mbps (it varies between 54 Mbps and 48 Mbps). I run iperf
for 8 hours and experienced only one one-second stall/hang over that time. Average iperf
speed was 19.5 Mbps and it varied between 15 and 23-24 Mbps most of the time.
Commenting out ieee80211_hw_set(hw, AMPDU_AGGREGATION);
in mac80211.c
results in cutting average speed by about a half (down to 29 Mbps) but improves stability too (rate stays at 72.2 Mbps and sometimes drops to 65 Mbps for a second). During the first 1,5 iperf session I had a one single stall/hang. During next one that took 3 hours I had none. Average speed was 30.4 Mbps (I mostly was 31 Mbps ± 4 Mbps).
Xiaomi Mi Router 4C (MT7628AN Wi-Fi SoC)
Example from OpenWrt 23.05.2 (iperf on STA connected to MT7628AN using channel 1 bandwidth 20 MHz):
[ 3] 75.0-76.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 76.0-77.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 77.0-78.0 sec 6.75 MBytes 56.6 Mbits/sec
[ 3] 78.0-79.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 79.0-80.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 80.0-81.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 81.0-82.0 sec 5.88 MBytes 49.3 Mbits/sec
[ 3] 82.0-83.0 sec 5.25 MBytes 44.0 Mbits/sec
[ 3] 83.0-84.0 sec 1.25 MBytes 10.5 Mbits/sec
[ 3] 84.0-85.0 sec 2.50 MBytes 21.0 Mbits/sec
[ 3] 85.0-86.0 sec 1.12 MBytes 9.44 Mbits/sec
[ 3] 86.0-87.0 sec 896 KBytes 7.34 Mbits/sec
[ 3] 87.0-88.0 sec 1.00 MBytes 8.39 Mbits/sec
[ 3] 88.0-89.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 89.0-90.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 90.0-91.0 sec 1.25 MBytes 10.5 Mbits/sec
[ 3] 91.0-92.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 92.0-93.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 93.0-94.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 94.0-95.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 95.0-96.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 96.0-97.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 97.0-98.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 98.0-99.0 sec 2.50 MBytes 21.0 Mbits/sec
[ 3] 99.0-100.0 sec 6.88 MBytes 57.7 Mbits/sec
[ 3] 100.0-101.0 sec 6.75 MBytes 56.6 Mbits/sec
[ 3] 101.0-102.0 sec 6.88 MBytes 57.7 Mbits/sec
[ 3] 102.0-103.0 sec 6.75 MBytes 56.6 Mbits/sec
[ 3] 103.0-104.0 sec 5.50 MBytes 46.1 Mbits/sec
[ 3] 104.0-105.0 sec 6.75 MBytes 56.6 Mbits/sec
[ 3] 105.0-106.0 sec 7.00 MBytes 58.7 Mbits/sec
[ 3] 106.0-107.0 sec 7.00 MBytes 58.7 Mbits/sec
[ 3] 107.0-108.0 sec 6.75 MBytes 56.6 Mbits/sec
Whenever traffic stops I can see that station's TX bitrate reported by router goes down from 72.2 Mbps to 6.5 Mbps.
Switching from HT20
to NOHT
results in rate being limited to 54 Mbps (it varies between 54 Mbps and 48 Mbps, sometimes 36 Mbps). I run iperf
for an hour without a single stall/hang. Average iperf
speed was 19.2 Mbps and it slowed from from 20 Mbps down to 9-10 Mbps a few times but never stalled/hanged completely.
It seems that all those slowdowns/stalls/hangs happen with high traffic only. Slowing Wi-Fi traffic down (by disabling HT or AMPDU) seems to mitigate them.
It's in sync with what I observed back in 2021 when I tried limiting iperf
traffic by using -b 20M
and -b 10M
.
I was wondering if hardware still generates any IRQs during those stalls/hangs/slowdowns. I cooked a very trivial & dirty patch: dbg-rx-irqs.txt. It's terrible quality but maybe it shows something interesting? Following is synced output of client's iperf
and router's kernel:
[ 3] 5.0- 6.0 sec 6.38 MBytes 53.5 Mbits/sec [ 1045.490814] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4681 0
[ 3] 6.0- 7.0 sec 6.50 MBytes 54.5 Mbits/sec [ 1046.530712] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4878 0
[ 3] 7.0- 8.0 sec 6.25 MBytes 52.4 Mbits/sec [ 1047.570853] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4832 0
[ 3] 8.0- 9.0 sec 7.50 MBytes 62.9 Mbits/sec [ 1048.610675] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4854 0
[ 3] 9.0-10.0 sec 6.25 MBytes 52.4 Mbits/sec [ 1049.650727] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4911 0
[ 3] 10.0-11.0 sec 6.50 MBytes 54.5 Mbits/sec [ 1050.690658] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4198 2
[ 3] 11.0-12.0 sec 3.88 MBytes 32.5 Mbits/sec [ 1051.730982] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 2251 19
[ 3] 12.0-13.0 sec 2.88 MBytes 24.1 Mbits/sec [ 1052.770621] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1606 17
[ 3] 13.0-14.0 sec 1.12 MBytes 9.44 Mbits/sec [ 1053.810652] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1991 0
[ 3] 14.0-15.0 sec 3.50 MBytes 29.4 Mbits/sec [ 1054.850595] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1669 0
[ 3] 15.0-16.0 sec 1.88 MBytes 15.7 Mbits/sec [ 1055.890584] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1676 0
[ 3] 16.0-17.0 sec 1.88 MBytes 15.7 Mbits/sec [ 1056.930789] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1052 0
[ 3] 17.0-18.0 sec 896 KBytes 7.34 Mbits/sec [ 1057.970606] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1086 0
[ 3] 18.0-19.0 sec 1.75 MBytes 14.7 Mbits/sec [ 1059.010619] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 1297 0
[ 3] 19.0-20.0 sec 2.62 MBytes 22.0 Mbits/sec [ 1060.050603] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4879 0
[ 3] 20.0-21.0 sec 6.50 MBytes 54.5 Mbits/sec [ 1061.090740] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4821 0
[ 3] 21.0-22.0 sec 6.25 MBytes 52.4 Mbits/sec [ 1062.130816] mt7603e 0000:02:00.0: [mt7603_dbg_watchdog] 4682 0
It turns out that whenever slow downs happen there are some MT_INT_RX_DONE(1)
interrupts (MCU interrupts).
I added some debugging to *_mcu_*_send.*()
functions and none of them is called after early init phase on MT7603EN. So it seems MCU is generating IRQs and sending those packets on its own (those are not replies to MCU requests).
Some debugging in mt7603_queue_rx_skb()
revealed that those are type 0 (PKT_TYPE_TXS
?) IRQs and tdx[1]
is 0x0080cd00
(which means idx
0) in my case. They refer to my only station connected to the router and result in mt76
calling ieee80211_sta_set_buffered()
.
Is that any good hint on what may be happening? My STA keeps sending traffic with iperf
so it clearly doesn't go to sleep. Can that be some queuing issue?
Hi @rmilecki, I have checked Mediateks proprietary driver and it seems that in that driver idx 0 is not used for client sta, idx used for sta starts from index 1, maybe there is some reason behind that ?
FWIW using Netgear OEM firmware with their wireless driver seems to make MT7603EN stable. There are slow downs (I'm wondering if those are also MCU / TX status related) but traffic never stalls/hangs. iperf-netgear-r6220-oem.txt
I have checked Mediateks proprietary driver and it seems that in that driver idx 0 is not used for client sta, idx used for sta starts from index 1, maybe there is some reason behind that ?
As a very quick test I connected another device (smartphone) to R6220's MT7603EN and then my ThinkPad notebook with iperf
client. I experience the same stability issues only this time tdx[1]
is 0x0080cd01
(which means idx
1). I didn't attempt modifying driver code to avoid idx 0 in general.
What might be interesting too, is that MT_HIGH_PRIORITY_1 high priority register value is different from the proprietary driver. MT76 has this value set to 0x55555553
in https://github.com/openwrt/mt76/blob/master/mt7603/init.c#L64, while proprietary driver has this value set to 0x55555555
. Also would be interesting to know the reason behind that
I started wondering if MT7603EN may actually stop/slow down on sending packets. I developed another ugly debugging patch for printing max queues lengths over 1 second: mt76_max_queued.txt
Here is my manually (accuracy < 1 s) synced output of station (iperf
output) and AP (kernel output):
[ 3] 0.0- 1.0 sec 7.75 MBytes 65.0 Mbits/sec [ 303.736585] [mt7603_dbg_watchdog] [IRQs] tx:2166 rx0:4725 rx1:0 [QUEUES] 0 0 17 0 0 1 0
[ 3] 1.0- 2.0 sec 6.38 MBytes 53.5 Mbits/sec [ 304.776652] [mt7603_dbg_watchdog] [IRQs] tx:2125 rx0:4708 rx1:0 [QUEUES] 0 0 6 0 0 1 0
[ 3] 2.0- 3.0 sec 7.12 MBytes 59.8 Mbits/sec [ 305.816434] [mt7603_dbg_watchdog] [IRQs] tx:2059 rx0:4472 rx1:0 [QUEUES] 0 0 4 0 0 1 0
[ 3] 3.0- 4.0 sec 6.50 MBytes 54.5 Mbits/sec [ 306.856286] [mt7603_dbg_watchdog] [IRQs] tx:2181 rx0:4735 rx1:0 [QUEUES] 1 0 10 0 0 1 0
[ 3] 4.0- 5.0 sec 6.50 MBytes 54.5 Mbits/sec [ 307.896319] [mt7603_dbg_watchdog] [IRQs] tx:2180 rx0:4784 rx1:0 [QUEUES] 0 0 4 0 0 1 0
[ 3] 5.0- 6.0 sec 6.50 MBytes 54.5 Mbits/sec [ 308.936232] [mt7603_dbg_watchdog] [IRQs] tx:1764 rx0:4029 rx1:3 [QUEUES] 0 0 22 0 0 1 0
[ 3] 6.0- 7.0 sec 6.62 MBytes 55.6 Mbits/sec [ 309.976276] [mt7603_dbg_watchdog] [IRQs] tx:1330 rx0:3281 rx1:12 [QUEUES] 1 0 63 0 0 1 0 ← q_tx[2] gets longer = traffic slow downs
[ 3] 7.0- 8.0 sec 4.00 MBytes 33.6 Mbits/sec [ 311.016331] [mt7603_dbg_watchdog] [IRQs] tx:1371 rx0:3269 rx1:13 [QUEUES] 0 0 57 0 0 1 0
[ 3] 8.0- 9.0 sec 4.75 MBytes 39.8 Mbits/sec [ 312.056313] [mt7603_dbg_watchdog] [IRQs] tx:640 rx0:1492 rx1:6 [QUEUES] 0 0 63 0 0 1 0
[ 3] 9.0-10.0 sec 1.75 MBytes 14.7 Mbits/sec [ 313.096662] [mt7603_dbg_watchdog] [IRQs] tx:1648 rx0:3656 rx1:0 [QUEUES] 0 0 5 0 0 1 0
[ 3] 10.0-11.0 sec 3.75 MBytes 31.5 Mbits/sec [ 314.136306] [mt7603_dbg_watchdog] [IRQs] tx:2151 rx0:4775 rx1:0 [QUEUES] 0 0 8 0 0 1 0
[ 3] 11.0-12.0 sec 6.62 MBytes 55.6 Mbits/sec [ 315.176833] [mt7603_dbg_watchdog] [IRQs] tx:2114 rx0:4678 rx1:0 [QUEUES] 0 0 10 0 0 1 0
[ 3] 12.0-13.0 sec 6.50 MBytes 54.5 Mbits/sec [ 316.216236] [mt7603_dbg_watchdog] [IRQs] tx:2179 rx0:4758 rx1:0 [QUEUES] 0 0 5 0 0 1 0
[ 3] 13.0-14.0 sec 6.50 MBytes 54.5 Mbits/sec [ 317.256330] [mt7603_dbg_watchdog] [IRQs] tx:2132 rx0:4745 rx1:0 [QUEUES] 0 0 7 0 0 1 0
[ 3] 14.0-15.0 sec 6.62 MBytes 55.6 Mbits/sec [ 318.296688] [mt7603_dbg_watchdog] [IRQs] tx:2168 rx0:4808 rx1:0 [QUEUES] 0 0 6 0 0 1 0
[ 3] 15.0-16.0 sec 6.50 MBytes 54.5 Mbits/sec [ 319.336478] [mt7603_dbg_watchdog] [IRQs] tx:2156 rx0:4788 rx1:0 [QUEUES] 0 0 6 0 0 1 0
[ 3] 16.0-17.0 sec 7.38 MBytes 61.9 Mbits/sec [ 320.376263] [mt7603_dbg_watchdog] [IRQs] tx:2236 rx0:4777 rx1:0 [QUEUES] 0 0 7 0 0 1 0
[ 3] 17.0-18.0 sec 6.62 MBytes 55.6 Mbits/sec [ 321.416134] [mt7603_dbg_watchdog] [IRQs] tx:1345 rx0:2985 rx1:0 [QUEUES] 0 0 3 0 0 1 0
[ 3] 0.0- 1.0 sec 8.38 MBytes 70.3 Mbits/sec [ 439.977705] [mt7603_dbg_watchdog] [IRQs] tx:1608 rx0:4945 rx1:0 [QUEUES] 0 0 22 0 0 1 0
[ 3] 1.0- 2.0 sec 7.00 MBytes 58.7 Mbits/sec [ 441.018929] [mt7603_dbg_watchdog] [IRQs] tx:1541 rx0:4946 rx1:0 [QUEUES] 0 0 35 0 0 1 0
[ 3] 2.0- 3.0 sec 6.62 MBytes 55.6 Mbits/sec [ 442.057523] [mt7603_dbg_watchdog] [IRQs] tx:1705 rx0:4996 rx1:0 [QUEUES] 0 0 14 0 0 1 0
[ 3] 3.0- 4.0 sec 6.62 MBytes 55.6 Mbits/sec [ 443.097840] [mt7603_dbg_watchdog] [IRQs] tx:1577 rx0:4980 rx1:0 [QUEUES] 0 0 31 0 0 1 0
[ 3] 4.0- 5.0 sec 6.75 MBytes 56.6 Mbits/sec [ 444.137589] [mt7603_dbg_watchdog] [IRQs] tx:1651 rx0:5030 rx1:0 [QUEUES] 0 0 33 0 0 1 0
[ 3] 5.0- 6.0 sec 6.62 MBytes 55.6 Mbits/sec [ 445.178764] [mt7603_dbg_watchdog] [IRQs] tx:1656 rx0:4940 rx1:0 [QUEUES] 0 0 33 0 0 1 0
[ 3] 6.0- 7.0 sec 6.62 MBytes 55.6 Mbits/sec [ 446.219212] [mt7603_dbg_watchdog] [IRQs] tx:1650 rx0:4953 rx1:0 [QUEUES] 0 0 31 0 0 1 0
[ 3] 7.0- 8.0 sec 6.62 MBytes 55.6 Mbits/sec [ 447.259172] [mt7603_dbg_watchdog] [IRQs] tx:1698 rx0:5006 rx1:0 [QUEUES] 0 0 37 0 0 1 0
[ 3] 8.0- 9.0 sec 5.75 MBytes 48.2 Mbits/sec [ 448.297197] [mt7603_dbg_watchdog] [IRQs] tx:1168 rx0:3675 rx1:13 [QUEUES] 1 0 63 0 0 1 0 ← q_tx[2] gets longer = traffic slow downs
[ 3] 9.0-10.0 sec 3.00 MBytes 25.2 Mbits/sec [ 449.337174] [mt7603_dbg_watchdog] [IRQs] tx:644 rx0:1982 rx1:16 [QUEUES] 0 0 42 0 0 1 0
[ 3] 10.0-11.0 sec 2.12 MBytes 17.8 Mbits/sec [ 450.377171] [mt7603_dbg_watchdog] [IRQs] tx:571 rx0:1793 rx1:6 [QUEUES] 0 0 41 0 0 1 0
[ 3] 11.0-12.0 sec 2.12 MBytes 17.8 Mbits/sec [ 451.417169] [mt7603_dbg_watchdog] [IRQs] tx:432 rx0:1613 rx1:9 [QUEUES] 0 0 89 0 0 1 0
[ 3] 12.0-13.0 sec 1.00 MBytes 8.39 Mbits/sec [ 452.457172] [mt7603_dbg_watchdog] [IRQs] tx:177 rx0:1102 rx1:5 [QUEUES] 0 0 71 0 0 1 0
[ 3] 13.0-14.0 sec 2.12 MBytes 17.8 Mbits/sec [ 453.497707] [mt7603_dbg_watchdog] [IRQs] tx:211 rx0:1008 rx1:7 [QUEUES] 0 0 86 0 0 1 0
[ 3] 14.0-15.0 sec 1.00 MBytes 8.39 Mbits/sec [ 454.537117] [mt7603_dbg_watchdog] [IRQs] tx:223 rx0:1191 rx1:6 [QUEUES] 0 0 119 0 0 1 0
[ 3] 15.0-16.0 sec 1.88 MBytes 15.7 Mbits/sec [ 455.577135] [mt7603_dbg_watchdog] [IRQs] tx:200 rx0:1194 rx1:3 [QUEUES] 0 0 98 0 0 1 0
[ 3] 16.0-17.0 sec 0.00 Bytes 0.00 bits/sec [ 456.617086] [mt7603_dbg_watchdog] [IRQs] tx:116 rx0:233 rx1:4 [QUEUES] 0 0 108 0 0 1 0
[ 3] 17.0-18.0 sec 0.00 Bytes 0.00 bits/sec [ 457.657417] [mt7603_dbg_watchdog] [IRQs] tx:40 rx0:294 rx1:1 [QUEUES] 0 0 18 0 0 1 0
[ 3] 18.0-19.0 sec 1.25 MBytes 10.5 Mbits/sec [ 458.697068] [mt7603_dbg_watchdog] [IRQs] tx:579 rx0:1806 rx1:9 [QUEUES] 0 0 44 0 0 1 0
[ 3] 19.0-20.0 sec 1.88 MBytes 15.7 Mbits/sec [ 459.737067] [mt7603_dbg_watchdog] [IRQs] tx:352 rx0:1874 rx1:6 [QUEUES] 0 0 101 0 0 1 0
[ 3] 20.0-21.0 sec 2.00 MBytes 16.8 Mbits/sec [ 460.777078] [mt7603_dbg_watchdog] [IRQs] tx:111 rx0:230 rx1:4 [QUEUES] 0 0 81 0 0 1 0
(scroll those above horizontally for my comments)
proprietary driver has this value set to
0x55555555
.
I changed mt76
to use 0x55555555
but that doesn't help
Here is my manually (accuracy < 1 s) synced output of station (
iperf
output) and AP (kernel output):(...)
And it seems that interrupts with MT_RXQ_MCU are coming at the time of the slowdown, client device goes to powersave mode and tx packets are loopbacked ?
And it seems that interrupts with MT_RXQ_MCU are coming at the time of the slowdown, client device goes to powersave mode and tx packets are loopbacked ?
Yeah, that's in sync with what I observed and described earlier in https://github.com/openwrt/mt76/issues/865#issuecomment-1980806588
Hi. @rmilecki thank you for reseaching. Also please take a look for https://github.com/openwrt/mt76/commit/a8d9553d8fc4db9c12022451ca1d2368e796c591#commitcomment-130672442 Watchdog functionality was broken. Rolling back this commit restores it.
@Linaro1985: FWIW I tried reverting that commit but it didn't help my case:
[ 3] 50.0-51.0 sec 4.75 MBytes 39.8 Mbits/sec
[ 3] 51.0-52.0 sec 4.62 MBytes 38.8 Mbits/sec
[ 3] 52.0-53.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 53.0-54.0 sec 3.62 MBytes 30.4 Mbits/sec
[ 3] 54.0-55.0 sec 5.50 MBytes 46.1 Mbits/sec
[ 3] 55.0-56.0 sec 4.62 MBytes 38.8 Mbits/sec
[ 3] 56.0-57.0 sec 4.62 MBytes 38.8 Mbits/sec
[ 3] 57.0-58.0 sec 5.75 MBytes 48.2 Mbits/sec
[ 3] 58.0-59.0 sec 2.88 MBytes 24.1 Mbits/sec
[ 3] 59.0-60.0 sec 3.00 MBytes 25.2 Mbits/sec
[ 3] 60.0-61.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 61.0-62.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 62.0-63.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 63.0-64.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 64.0-65.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 65.0-66.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 66.0-67.0 sec 1.75 MBytes 14.7 Mbits/sec
[ 3] 67.0-68.0 sec 7.50 MBytes 62.9 Mbits/sec
[ 3] 68.0-69.0 sec 7.25 MBytes 60.8 Mbits/sec
[ 3] 69.0-70.0 sec 6.50 MBytes 54.5 Mbits/sec
(...)
[ 3] 150.0-151.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 151.0-152.0 sec 3.00 MBytes 25.2 Mbits/sec
[ 3] 152.0-153.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 153.0-154.0 sec 1.25 MBytes 10.5 Mbits/sec
[ 3] 154.0-155.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 155.0-156.0 sec 2.75 MBytes 23.1 Mbits/sec
[ 3] 156.0-157.0 sec 1.00 MBytes 8.39 Mbits/sec
[ 3] 157.0-158.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 158.0-159.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 159.0-160.0 sec 7.38 MBytes 61.9 Mbits/sec
[ 3] 160.0-161.0 sec 6.62 MBytes 55.6 Mbits/sec
[ 3] 161.0-162.0 sec 6.38 MBytes 53.5 Mbits/sec
[ 3] 162.0-163.0 sec 7.62 MBytes 64.0 Mbits/sec
[ 3] 163.0-164.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 164.0-165.0 sec 6.50 MBytes 54.5 Mbits/sec
(...)
[ 3] 175.0-176.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 176.0-177.0 sec 7.62 MBytes 64.0 Mbits/sec
[ 3] 177.0-178.0 sec 6.50 MBytes 54.5 Mbits/sec
[ 3] 178.0-179.0 sec 7.25 MBytes 60.8 Mbits/sec
[ 3] 179.0-180.0 sec 5.62 MBytes 47.2 Mbits/sec
[ 3] 180.0-181.0 sec 1.88 MBytes 15.7 Mbits/sec
[ 3] 181.0-182.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 182.0-183.0 sec 1.00 MBytes 8.39 Mbits/sec
[ 3] 183.0-184.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 184.0-185.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 185.0-186.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 186.0-187.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 187.0-188.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 188.0-189.0 sec 5.50 MBytes 46.1 Mbits/sec
[ 3] 189.0-190.0 sec 4.50 MBytes 37.7 Mbits/sec
[ 3] 190.0-191.0 sec 3.75 MBytes 31.5 Mbits/sec
[ 3] 191.0-192.0 sec 3.75 MBytes 31.5 Mbits/sec
[ 3] 192.0-193.0 sec 4.88 MBytes 40.9 Mbits/sec
[ 3] 193.0-194.0 sec 4.88 MBytes 40.9 Mbits/sec
[ 3] 194.0-195.0 sec 5.50 MBytes 46.1 Mbits/sec
Please note that my issue goes back to 2021 at least. I guess those are just 2 different problems.
I developed a simple workaround that seems to fix stability for me with MT7603 and MT7628: [PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
I just pushed that under-review PATCH to OpenWrt, see commit 7236d4f82b57 ("mt76: add mt7603 possible workaround for MT7603EN / MT7628AN stability")
My both devices seem really stable with mt7603
as soon as I do:
echo N > /sys/kernel/debug/ieee80211/phy0/mt76/frames_buffering
The patch for disabling frames buffering seems to have been deleted a few hours ago on https://github.com/openwrt/openwrt/commit/a10a6fbac794b30885d65ec817ebdcfe9f94d78a
Besides that, a new version of mt76 arrived and two commits from there are fixes for mt7603. That would be the final solution?
@enmaskarado I think it would be the final solution because of https://github.com/openwrt/mt76/commit/e4de3592c4e3baa82142eff583cb5a761f790709 (see commit description)
By the way, I'm already testing the fixes and so far everything is fine.
I have not experienced any stability issues after https://github.com/openwrt/mt76/commit/e4de3592c4e3baa82142eff583cb5a761f790709 . For me, mt76 is more stable than proprietary driver now. There are some slow downs for a MI 4C MT7628 router using proprietary drivers but mt76 doesn't suffer from that now.
@everything411 On which version of Openwrt are you? Have you got also these mt76_wmac MCU timed out problem? https://github.com/openwrt/mt76/issues/628 I can't fix it yet Did you change eth driver as well?
From OpenWRT 23, I patched mt76 driver with two commit changes you mentionned, I still have the problem
@biboc I'm on OpenWrt master. do you backport b14c235? this commit is not in 23.05
@everything411 I built OpenWRT and I upgraded Makefile https://github.com/openwrt/openwrt/blob/main/package/kernel/mt76/Makefile to 2024-04-03 that includes https://github.com/openwrt/mt76/commit/b14c2351ddb8601c322576d84029e463d456caef Doesn't it?
It got multiple MCU HANG like describe here: https://github.com/openwrt/mt76/issues/628 It may be the cause of my problem
Ok MCU HANG comes from another program that restarted wifi Now I only have BEACON stuck and tx hang
# cat /sys/kernel/debug/ieee80211/phy0/mt76/reset
TX hang: 88
TX DMA busy stuck: 0
RX DMA busy stuck: 0
Beacon stuck: 9172
RX PSE busy stuck: 0
MCU hang: 0
PSE reset failed: 0
And a ping which is very long 4 to 15 seconds!
# ping 10.201.21.88
PING 10.201.21.88 (10.201.21.88): 56 data bytes
64 bytes from 10.201.21.88: seq=0 ttl=64 time=12517.019 ms
64 bytes from 10.201.21.88: seq=1 ttl=64 time=11659.209 ms
64 bytes from 10.201.21.88: seq=2 ttl=64 time=14620.829 ms
64 bytes from 10.201.21.88: seq=3 ttl=64 time=14952.260 ms
64 bytes from 10.201.21.88: seq=4 ttl=64 time=15381.312 ms
64 bytes from 10.201.21.88: seq=5 ttl=64 time=15897.280 ms
64 bytes from 10.201.21.88: seq=6 ttl=64 time=14896.970 ms
64 bytes from 10.201.21.88: seq=7 ttl=64 time=13912.369 ms
64 bytes from 10.201.21.88: seq=8 ttl=64 time=13224.867 ms
64 bytes from 10.201.21.88: seq=16 ttl=64 time=8219.529 ms
64 bytes from 10.201.21.88: seq=17 ttl=64 time=7528.824 ms
64 bytes from 10.201.21.88: seq=18 ttl=64 time=8228.625 ms
64 bytes from 10.201.21.88: seq=21 ttl=64 time=6560.212 ms
64 bytes from 10.201.21.88: seq=22 ttl=64 time=6444.579 ms
64 bytes from 10.201.21.88: seq=23 ttl=64 time=7337.334 ms
64 bytes from 10.201.21.88: seq=24 ttl=64 time=6492.642 ms
64 bytes from 10.201.21.88: seq=25 ttl=64 time=6079.157 ms
64 bytes from 10.201.21.88: seq=26 ttl=64 time=5514.735 ms
Station is closed to this one with ok metric:
Station --:--:--:--:--:-- (on mesh0)
signal: -49 [-49, -87] dBm
signal avg: -48 [-48, -86] dBm
mesh plink: ESTAB
DEST ADDR NEXT HOP IFACE SN METRIC QLEN EXPTIME DTIM DRET FLAGS HOP_COUNT PATH_CHANGE
--:--:--:--:--:-- --:--:--:--:--:-- mesh0 1 1366 0 2900 1600 4 0x15 1 1
@everything411 I'm on Openwrt 23.05.3 + mt76 PKG_SOURCE_DATE:=2024-04-03 PKG_SOURCE_VERSION:=1e336a8582dce2ef32ddd440d423e9afef961e71
@nbd168 Any idea on my problem? Why ping and connection to other nodes are so slow? And what is the cause of Beacon stuck and TX hang ? Thanks,
I no longer have problems with hangs but sometimes when I reboot the device I get this
[ 10.722558] pci 0000:00:00.0: enabling device (0000 -> 0003)
[ 10.733956] mt7603e 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 10.753158] mt7603e 0000:01:00.0: ASIC revision: 0000
[ 10.764095] ------------[ cut here ]------------
[ 10.773293] WARNING: CPU: 3 PID: 535 at target-mipsel_24kc_musl/linux-ramips_mt7621/mt76-2024-04-03-1e336a85/mt7603/eeprom.c:27 0x823a7f00 [mt7603e@(ptrval)+0x9980]
[ 10.802587] Modules linked in: mt7603e(+) mt76_connac_lib mt76 mac80211 libchacha20poly1305 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_policy xt_multiport xt_mark xt_mac xt_limit xt_esp xt_comment xt_TCPMSS xt_LOG xfrm_interface ts_kmp ts_fsm ts_bm slhc poly1305_mips nfnetlink nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c iptable_mangle iptable_filter ipt_ah ip_tables hwmon crc_ccitt compat chacha_mips asn1_decoder ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ip6_gre ip_gre gre l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel ipcomp esp4 ah4 ip6_tunnel tunnel6 tunnel4 ip_tunnel xfrm_user xfrm_ipcomp af_key xfrm_algo crypto_user algif_skcipher algif_rng algif_hash algif_aead af_alg sha512_generic sha256_generic libsha256 sha1_generic seqiv jitterentropy_rng drbg md5 kpp crypto_hw_eip93 hmac echainiv ecb des_generic libdes cmac cbc authencesn authenc arc4 leds_gpio
[ 10.803332] gpio_button_hotplug crc32c_generic
[ 10.985342] CPU: 3 PID: 535 Comm: kmodloader Not tainted 5.15.150 #0
[ 10.997999] Stack : 000f0000 823b0000 00000001 80083bf0 00000000 00000000 00000000 00000000
[ 11.014687] 00000000 00000000 00000000 00000000 00000000 00000001 82057ad0 80c7f460
[ 11.031371] 82057b68 00000000 00000000 82057978 00000038 8039f0e4 ffffffea 00000000
[ 11.048056] 82057984 000000f0 8081cab0 ffffffff 8073ae10 82057ab0 00000000 823a7f00
[ 11.064744] 00000009 82860220 000f0000 823b0000 00000018 80411304 0000000c 809d000c
[ 11.081431] ...
[ 11.086301] Call Trace:
[ 11.086355] [<80083bf0>] 0x80083bf0
[ 11.098168] [<8039f0e4>] 0x8039f0e4
[ 11.105129] [<823a7f00>] 0x823a7f00 [mt7603e@(ptrval)+0x9980]
[ 11.116590] [<80411304>] 0x80411304
[ 11.123544] [<80007908>] 0x80007908
[ 11.130484] [<80007910>] 0x80007910
[ 11.137428] [<823a7f00>] 0x823a7f00 [mt7603e@(ptrval)+0x9980]
[ 11.148873] [<803831c4>] 0x803831c4
[ 11.155829] [<80720000>] 0x80720000
[ 11.162771] [<8002df2c>] 0x8002df2c
[ 11.169712] [<823a7f00>] 0x823a7f00 [mt7603e@(ptrval)+0x9980]
[ 11.181159] [<8002e010>] 0x8002e010
[ 11.188117] [<823a7dd0>] 0x823a7dd0 [mt7603e@(ptrval)+0x9980]
[ 11.197962] urngd: v1.0.2 started.
[ 11.199609] [<823a7f00>] 0x823a7f00 [mt7603e@(ptrval)+0x9980]
[ 11.217800] [<823a1fa8>] 0x823a1fa8 [mt7603e@(ptrval)+0x9980]
[ 11.229277] [<8008c864>] 0x8008c864
[ 11.236234] [<8041d844>] 0x8041d844
[ 11.243193] [<823a0168>] 0x823a0168 [mt7603e@(ptrval)+0x9980]
[ 11.254633] [<803d3c98>] 0x803d3c98
[ 11.261588] [<803cacd0>] 0x803cacd0
[ 11.268527] [<803ca2b8>] 0x803ca2b8
[ 11.275475] [<80424474>] 0x80424474
[ 11.282416] [<802521dc>] 0x802521dc
[ 11.289367] [<804249a8>] 0x804249a8
[ 11.296330] [<80425138>] 0x80425138
[ 11.303279] [<8042508c>] 0x8042508c
[ 11.310217] [<80421e68>] 0x80421e68
[ 11.317179] [<80423648>] 0x80423648
[ 11.321222] irq 26: nobody cared (try booting with the "irqpoll" option)
[ 11.324144] [<80425aa0>] 0x80425aa0
[ 11.344367] [<8018dcf0>] 0x8018dcf0
[ 11.351321] [<823af048>] 0x823af048 [mt7603e@(ptrval)+0x9980]
[ 11.362769] [<823af000>] 0x823af000 [mt7603e@(ptrval)+0x9980]
[ 11.374214] [<8000157c>] 0x8000157c
[ 11.381180] [<800c5664>] 0x800c5664
[ 11.388117] [<802b5d0c>] 0x802b5d0c
[ 11.395078] [<800c350c>] 0x800c350c
[ 11.402023] [<800c5738>] 0x800c5738
[ 11.408984] [<80014550>] 0x80014550
Another reboot fixes mt7603 initialization.
I'll open a new issue