mt76 icon indicating copy to clipboard operation
mt76 copied to clipboard

MT7603E 2.4GHz interface stability issues

Open dfateyev opened this issue 2 years ago • 39 comments

I would like to bring up the topic of MT7603E stability in latest mt76 versions again. It's inspired with https://github.com/openwrt/openwrt/pull/11220 and the upcoming switch of ramips target to 5.15 kernel.

I have been testing for a while a WF3526P / ZBT WE1326 device, equipped with MT7603E+MT7612E, with "kernel-5.15" openwrt from master — and experience issues with 2.4Ghz (MT7603E) interface stability.

The problem is that the WiFi connection is stuck at some moment of time, more likely when the load is increasing and much data is transferred over the wireless connection. The issue can be easily reproduced with iperf3 running within a couple of minutes. When the issue occurs, a client stays connected to the AP (no visual changes), but no wireless traffic can pass the connection. There are no errors in logs both on the client side and on AP. The workaround is to reconnect to the AP, and it proceeds to work until the next connection hang.

There are several tickets on MT7603E: #669, #576, #419, #411, #391, #390, #375, etc., they're probably related, but show different symptoms. Also, worth to mention:

  • This is not a hardware issue. The device worked fine on openwrt-19.07 with uptime for more than a year, and no issues were observed;
  • The issue presents already in openwrt-22.03, that's the reason why I've tried looking for the fixes in master, but to no avail;
  • The interface works much more stable in legacy mode (802.11g), probably the issue is related to 802.11n proper support. I've initially suspected #576, but SMPS should be already disabled for MT7603E.

dfateyev avatar Dec 17 '22 20:12 dfateyev

how about 21.02 branch?

I am afraid you need to do bisection on your own.

lukasz1992 avatar Dec 20 '22 07:12 lukasz1992

how about 21.02 branch?

I haven't tested 21.02, since after 23.xx release it will be out of support (facing the same situation as with 19.xx). I probably need, to gather more details.

I am afraid you need to do bisection on your own.

Probably, but hope that there is still some interest from developers and the community in these devices. I can test any improvements on this hardware, but mostly oriented towards fixes for master/23.xx versions, since for (very) older versions we already got it working.

dfateyev avatar Dec 20 '22 11:12 dfateyev

MT7603E sometimes stales (randomly?) on my MacBook Air 2020 Intel without giving any errors as described. Using MT76x0E (5 GHz) on the same device is fine. Using latest snapshot with 5.10 kernel.

khanjui avatar Dec 24 '22 02:12 khanjui

he asked you to test/bisect it with 21.02 to find out which openwrt version introduced the bug. that might reveal which change actually caused the bug and allows devs to fix it in 22 ans/or 23 :)

he didn't ask you to test 21.02 so it could be fixed there 😅

Djfe avatar Jan 03 '23 00:01 Djfe

For information, since I use openwrt with my router I have huge instabilities with my bgn WiFi. I started 3 or 4 years ago so it was in version 19.x in my memories...

In AP mode it disappears randomly, crash, sometimes disconnect devices, etc. What I noticed is the fact when I enable an interface in client mode (with mobile phone connection sharing for example) it becomes a lot more stable magically...

gillg avatar Jan 03 '23 07:01 gillg

There is a regression between openwrt-19.07 and openwrt-21.02, in regard of MT7603E support in mt76. I can reproduce connection issues on openwrt-21.02 snapshot after massive testing, but they rare. With newer OpenWRT versions, it's more unstable. I will probably stay with openwrt-21.02 on MT7603E.

dfateyev avatar Jan 04 '23 19:01 dfateyev

Edit: Don't worked, random 2.4ghz disconnects continue

I use snapshot image from 9 january 2023. OpenWrt SNAPSHOT r21728-fc33c41c21 / LuCI Master git-22.361.69865-deed682

And have the same problem. (on this setup, 2ghz, n, ht20, 6 channel)

I do two configs below and now, 12 hours later, i dont have any disconnection. (will continue testing)

1 - Disable multicast on phy0-ap0 interface To do this go to luci and interfaces > devices > phy0-ap0

2 - Change the wpa2 wifi criptography cipher from [auto] to [aes]

Ps: This bug dont show any warning, notices or information on log system. i dont know if this bug is a criptography key problem (Tkip or auto switch) or multicast flood on wireless clients.

Update: Don't worked, random 2.4ghz disconnects continue

choice77 avatar Jan 13 '23 03:01 choice77

What i discover:

My disconnections is not related with a number of 2ghz clients/sta connected to router. (i have only one device with 2.4ghz in my house)

I noticed that random disconnections is related to wear/poor signal conditions. But this disconnections don´t ocurred with archer c6 v3 official tplink firmware.

My theory: (i will test for 48 hours and inform , here) The option Time interval for rekeying GTK is too short in openwrt default (by driver). The field value is set with only 300 seconds. (example: on ddwrt this default is 3600 seconds).

On bad signal conditions, the excess of renewed key, can cause hang and disconnections on 2.4ghz wireless clients.

choice77 avatar Jan 13 '23 23:01 choice77

In my case how fix:

  • Disable sofware and hardware offloading
  • Disable WMM (because disable QoS)
  • Select country code

Now stable across all clients.

webysther avatar Apr 15 '23 12:04 webysther

MT7603e does not handle SMPS well, making 2.4GHz WiFi disappear or system crash or connection unstable or lost #576, but SMPS should be already disabled for #MT7603E

I don't have deeper knowledge of the system but what if after all this time, the solution to this is to enable SMPS but with refine code to work with MT7603E? I don't know if it's worth a try, though like I said my knowledge is limited.

shown19 avatar Jun 21 '23 05:06 shown19

looking at the codebase SMPS is supported and enabled as far as I can tell. The code was added in Jan 2019 so it might be part of openwrt-19.07, but I haven't checked. https://github.com/openwrt/mt76/commit/fc31457cd99cb85c8cea9329eedc5edd80038f29

@dfateyev what made you think it was disabled? easyteacher closed their PRs before they were ever merged

Djfe avatar Jun 21 '23 21:06 Djfe

@Djfe Hello, I think he meant this?

sm disabled

if so, mine is disabled too, device is Newifi D2

shown19 avatar Jun 22 '23 00:06 shown19

makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)

Djfe avatar Jun 22 '23 02:06 Djfe

makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)

by the way, I'm on the latest snapshot build now and this is also disabled in the stable release v22.03.5 but I'm not sure in the older version down to v19 .07 if it was still disabled. I might take a look at it if I got more spare time again.

shown19 avatar Jun 22 '23 02:06 shown19

makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)

I don't have deeper knowledge about this but when I checked the code based on easyteacher info, since I'm learning how to compile also, I can actually see the code block about smps is enabled but unfortunately after flashing, I don't know why it isn't enabled. Maybe there's a conditional statement something that is disabled in some of mt76 devices? Sorry my knowledge is limited.

shown19 avatar Jun 22 '23 03:06 shown19

what made you think it was disabled? easyteacher closed their PRs before they were ever merged makes me curious whether it is also disabled on openwrt-19.07 (I don't own an affected device)

Beside the SM power save in "disabled" state, I didn't manage to trigger any SMPS related events while testing this board last year. The SMPS option already presents in v19.07: SM Power Save disabled. I still have one MT7603E under v19.07.

dfateyev avatar Jun 22 '23 09:06 dfateyev

@dfateyev hi, may I know what specific v19.07 of openwrt you're using?

shown19 avatar Jun 22 '23 10:06 shown19

may I know what specific v19.07 of openwrt you're using?

OpenWrt 19.07.10, r11427-9ce6aa9d8d, device ZBT WE1326 / WE3526.

dfateyev avatar Jun 22 '23 22:06 dfateyev

This also applies to me. The 2.4GHz is very unstable causing connection crash and reconnect attempts. I am using a TP Link Archer C6 V3 (EU) running OpenWRT 22.03.5 (DISTRIB_DESCRIPTION: OpenWrt 22.03.5 r20134-5f15225c1e)

malekairmaroc7 avatar Jun 29 '23 09:06 malekairmaroc7

Same. Xiaomi Router 4A (R4AC) OpenWrt SNAPSHOT r23454-01885bc6a3 / LuCI Master git-23.158.78004-23a246e

ShredRum avatar Jun 29 '23 19:06 ShredRum

Aren't there any alternative drivers?

malekairmaroc7 avatar Jun 29 '23 23:06 malekairmaroc7

they are, but incompatible by luci installed by default (there is mediatek module for luci where it works). Also uci2dat is needed to sync config with uci

lukasz1992 avatar Jun 30 '23 09:06 lukasz1992

I see. Too bad.

malekairmaroc7 avatar Jul 01 '23 22:07 malekairmaroc7

Please try latest OpenWrt master or 23.05 branch

nbd168 avatar Jul 27 '23 08:07 nbd168

@nbd168 I have been testing 23.05 branch on MT7603E for a week (commit c697057b from Aug 05, 2023).

The issue with 2.4Ghz stability still present: 802.11n 20MHz band WPA2 on AP, iperf3 from an AP client to a DMZ host leads to LA 0.8-0.9 on AP and WLAN connection stuck. I also disabled NAT and MSS clamping on AP, but it didn't improve the situation with LA and stability. There are no any relevant logs both on AP and client's side.

The good news is that legacy 802.11g mode is now fully stable: hammered it with iperf3 for days without a drop. It features a low bandwidth, LA on AP doesn't go beyond 0.4, and in general, makes the AP much less useful.

dfateyev avatar Aug 13 '23 16:08 dfateyev

Please try this patch on top of current mt76: https://nbd.name/p/762e9946

nbd168 avatar Aug 14 '23 12:08 nbd168

Please try this patch on top of current mt76: https://nbd.name/p/762e9946

I applied the patch against mt76 master, and used it with "openwrt-23.05" build (commit b59d02be). I noticed a slightly decreased LA, but while loading the AP with iperf3 from 2 clients the AP crashed/restarted in 2-3h. Repeated the same test with BW load, and the AP went unresponsive in 2-3h again — this time w/o reboot, although LEDs are active, there is no WLAN in air and no LAN access. Seems, I cannot provide a crash log from the AP, sorry. During the load test, I also saw increasing beacon stuck count, similar to https://github.com/openwrt/mt76/issues/793#issuecomment-1680167853.

dfateyev avatar Aug 16 '23 22:08 dfateyev

AP went unresponsive in 2-3h again — this time w/o reboot, although LEDs are active, there is no WLAN in air and no LAN access

Oh I thought I was the only one, I also experienced this also and one of the reasons why I disabled the WIFI and used another access point.

shown19 avatar Aug 17 '23 02:08 shown19

Can you show us the output of dmesg | grep -i mt76

DragonBluep avatar Aug 17 '23 02:08 DragonBluep

Can you show us the output of dmesg | grep -i mt76

Unfortunately I cannot, since WLAN and LAN access to the AP is lost. It looks like AP is alive but unresponsive via network. I probably need a serial console, but it would require pin soldering, etc.

dfateyev avatar Aug 17 '23 11:08 dfateyev