mt76
mt76 copied to clipboard
MT7603E 2.4GHz interface stability issues
I would like to bring up the topic of MT7603E stability in latest mt76 versions again. It's inspired with https://github.com/openwrt/openwrt/pull/11220 and the upcoming switch of ramips target to 5.15 kernel.
I have been testing for a while a WF3526P / ZBT WE1326 device, equipped with MT7603E+MT7612E, with "kernel-5.15" openwrt from master — and experience issues with 2.4Ghz (MT7603E) interface stability.
The problem is that the WiFi connection is stuck at some moment of time, more likely when the load is increasing and much data is transferred over the wireless connection. The issue can be easily reproduced with iperf3
running within a couple of minutes.
When the issue occurs, a client stays connected to the AP (no visual changes), but no wireless traffic can pass the connection. There are no errors in logs both on the client side and on AP.
The workaround is to reconnect to the AP, and it proceeds to work until the next connection hang.
There are several tickets on MT7603E: #669, #576, #419, #411, #391, #390, #375, etc., they're probably related, but show different symptoms. Also, worth to mention:
- This is not a hardware issue. The device worked fine on openwrt-19.07 with uptime for more than a year, and no issues were observed;
- The issue presents already in openwrt-22.03, that's the reason why I've tried looking for the fixes in master, but to no avail;
- The interface works much more stable in legacy mode (802.11g), probably the issue is related to 802.11n proper support. I've initially suspected #576, but SMPS should be already disabled for MT7603E.
how about 21.02 branch?
I am afraid you need to do bisection on your own.
how about 21.02 branch?
I haven't tested 21.02, since after 23.xx release it will be out of support (facing the same situation as with 19.xx). I probably need, to gather more details.
I am afraid you need to do bisection on your own.
Probably, but hope that there is still some interest from developers and the community in these devices. I can test any improvements on this hardware, but mostly oriented towards fixes for master/23.xx versions, since for (very) older versions we already got it working.
MT7603E sometimes stales (randomly?) on my MacBook Air 2020 Intel without giving any errors as described. Using MT76x0E (5 GHz) on the same device is fine. Using latest snapshot with 5.10 kernel.
he asked you to test/bisect it with 21.02 to find out which openwrt version introduced the bug. that might reveal which change actually caused the bug and allows devs to fix it in 22 ans/or 23 :)
he didn't ask you to test 21.02 so it could be fixed there 😅
For information, since I use openwrt with my router I have huge instabilities with my bgn WiFi. I started 3 or 4 years ago so it was in version 19.x in my memories...
In AP mode it disappears randomly, crash, sometimes disconnect devices, etc. What I noticed is the fact when I enable an interface in client mode (with mobile phone connection sharing for example) it becomes a lot more stable magically...
There is a regression between openwrt-19.07
and openwrt-21.02
, in regard of MT7603E support in mt76.
I can reproduce connection issues on openwrt-21.02
snapshot after massive testing, but they rare. With newer OpenWRT versions, it's more unstable. I will probably stay with openwrt-21.02
on MT7603E.
Edit: Don't worked, random 2.4ghz disconnects continue
I use snapshot image from 9 january 2023. OpenWrt SNAPSHOT r21728-fc33c41c21 / LuCI Master git-22.361.69865-deed682
And have the same problem. (on this setup, 2ghz, n, ht20, 6 channel)
I do two configs below and now, 12 hours later, i dont have any disconnection. (will continue testing)
1 - Disable multicast on phy0-ap0 interface To do this go to luci and interfaces > devices > phy0-ap0
2 - Change the wpa2 wifi criptography cipher from [auto] to [aes]
Ps: This bug dont show any warning, notices or information on log system. i dont know if this bug is a criptography key problem (Tkip or auto switch) or multicast flood on wireless clients.
Update: Don't worked, random 2.4ghz disconnects continue
What i discover:
My disconnections is not related with a number of 2ghz clients/sta connected to router. (i have only one device with 2.4ghz in my house)
I noticed that random disconnections is related to wear/poor signal conditions. But this disconnections don´t ocurred with archer c6 v3 official tplink firmware.
My theory: (i will test for 48 hours and inform , here) The option Time interval for rekeying GTK is too short in openwrt default (by driver). The field value is set with only 300 seconds. (example: on ddwrt this default is 3600 seconds).
On bad signal conditions, the excess of renewed key, can cause hang and disconnections on 2.4ghz wireless clients.
In my case how fix:
- Disable sofware and hardware offloading
- Disable WMM (because disable QoS)
- Select country code
Now stable across all clients.
MT7603e does not handle SMPS well, making 2.4GHz WiFi disappear or system crash or connection unstable or lost #576, but SMPS should be already disabled for #MT7603E
I don't have deeper knowledge of the system but what if after all this time, the solution to this is to enable SMPS but with refine code to work with MT7603E? I don't know if it's worth a try, though like I said my knowledge is limited.
looking at the codebase SMPS is supported and enabled as far as I can tell. The code was added in Jan 2019 so it might be part of openwrt-19.07, but I haven't checked. https://github.com/openwrt/mt76/commit/fc31457cd99cb85c8cea9329eedc5edd80038f29
@dfateyev what made you think it was disabled? easyteacher closed their PRs before they were ever merged
@Djfe Hello, I think he meant this?
if so, mine is disabled too, device is Newifi D2
makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)
makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)
by the way, I'm on the latest snapshot build now and this is also disabled in the stable release v22.03.5 but I'm not sure in the older version down to v19 .07 if it was still disabled. I might take a look at it if I got more spare time again.
makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)
I don't have deeper knowledge about this but when I checked the code based on easyteacher info, since I'm learning how to compile also, I can actually see the code block about smps is enabled but unfortunately after flashing, I don't know why it isn't enabled. Maybe there's a conditional statement something that is disabled in some of mt76 devices? Sorry my knowledge is limited.
what made you think it was disabled? easyteacher closed their PRs before they were ever merged makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)
Beside the SM power save in "disabled" state, I didn't manage to trigger any SMPS related events while testing this board last year. The SMPS option already presents in v19.07: SM Power Save disabled
. I still have one MT7603E under v19.07.
@dfateyev hi, may I know what specific v19.07 of openwrt you're using?
may I know what specific v19.07 of openwrt you're using?
OpenWrt 19.07.10, r11427-9ce6aa9d8d, device ZBT WE1326 / WE3526.
This also applies to me. The 2.4GHz is very unstable causing connection crash and reconnect attempts. I am using a TP Link Archer C6 V3 (EU) running OpenWRT 22.03.5 (DISTRIB_DESCRIPTION: OpenWrt 22.03.5 r20134-5f15225c1e)
Same. Xiaomi Router 4A (R4AC) OpenWrt SNAPSHOT r23454-01885bc6a3 / LuCI Master git-23.158.78004-23a246e
Aren't there any alternative drivers?
they are, but incompatible by luci installed by default (there is mediatek module for luci where it works). Also uci2dat is needed to sync config with uci
I see. Too bad.
Please try latest OpenWrt master or 23.05 branch
@nbd168 I have been testing 23.05 branch on MT7603E for a week (commit c697057b from Aug 05, 2023).
The issue with 2.4Ghz stability still present: 802.11n 20MHz band WPA2 on AP, iperf3
from an AP client to a DMZ host leads to LA 0.8-0.9 on AP and WLAN connection stuck. I also disabled NAT and MSS clamping on AP, but it didn't improve the situation with LA and stability. There are no any relevant logs both on AP and client's side.
The good news is that legacy 802.11g mode is now fully stable: hammered it with iperf3
for days without a drop. It features a low bandwidth, LA on AP doesn't go beyond 0.4, and in general, makes the AP much less useful.
Please try this patch on top of current mt76: https://nbd.name/p/762e9946
Please try this patch on top of current mt76: https://nbd.name/p/762e9946
I applied the patch against mt76 master, and used it with "openwrt-23.05" build (commit b59d02be).
I noticed a slightly decreased LA, but while loading the AP with iperf3
from 2 clients the AP crashed/restarted in 2-3h. Repeated the same test with BW load, and the AP went unresponsive in 2-3h again — this time w/o reboot, although LEDs are active, there is no WLAN in air and no LAN access. Seems, I cannot provide a crash log from the AP, sorry.
During the load test, I also saw increasing beacon stuck count, similar to https://github.com/openwrt/mt76/issues/793#issuecomment-1680167853.
AP went unresponsive in 2-3h again — this time w/o reboot, although LEDs are active, there is no WLAN in air and no LAN access
Oh I thought I was the only one, I also experienced this also and one of the reasons why I disabled the WIFI and used another access point.
Can you show us the output of dmesg | grep -i mt76
Can you show us the output of
dmesg | grep -i mt76
Unfortunately I cannot, since WLAN and LAN access to the AP is lost. It looks like AP is alive but unresponsive via network. I probably need a serial console, but it would require pin soldering, etc.