itlwm icon indicating copy to clipboard operation
itlwm copied to clipboard

WiFi disconnects on sustained upload or download

Open mikebeaton opened this issue 1 year ago • 45 comments

Have You Read Our Docs Yes

Are You Reporting A Bug Yes

Environment

  • Kext Version: AirportItlwm-Monterey-v2.3.0-DEBUG-alpha-ff1138b
  • WiFi Card Model: Intel(R) Wireless-AC 9560 160MHz
  • PCI Product ID: 8086:A370, subsystem 8086:0030
  • macOS Version: Monterey 13.6 (also happens on Ventura and Sonoma)

Description For general browsing WiFi works, but on sustained traffic up or down, WiFi fairly quickly hangs (actually disconnects) and transfer hangs. (Note the same network is fine on other machines, and the same WiFi card and network is fine on Linux on the same machine.)

Bug Report Archive Log_2023-11-09_07-54-24.log

Showing the disconnect which occurs on sustained throughput (will sometimes go to reconnecting to WiFi (as shown); will sometimes eventually go to exclamation mark permanent error signal on WiFi icon (not shown here)): foo

Example of download hanging (seems to hang at the same point on multiple attempts?): Screenshot 2023-11-09 at 07 54 00

Error message on cancel, may have useful info?: Screenshot 2023-11-09 at 07 54 13

Kext Download Source Downloaded from alpha releases from this site.

mikebeaton avatar Nov 09 '23 08:11 mikebeaton

Have you tried V2.2.0? Or the Preview05 version https://github.com/OpenIntelWireless/itlwm/issues/883#issuecomment-1670749680 ?

zxystd avatar Nov 10 '23 01:11 zxystd

Thank you for the suggestions.

I have tested these and yes v2.2.0 works (tested on Ventura) and Preview05 works (tested on Sonoma), so it is just v2.3.0-DEBUG-alpha-ff1138b which shows the problem (on all tested macOS).

I have realised that the issue is easily replicable and seems to be related to something specific which git is doing.

That specific command git clone [email protected]:acidanthera/OcBuild.git (at least on the current state of that repo) pauses at 26%, 34% and 98%, even on macOS with a working or wired connection, and even on Linux. But it passes those pauses and finishes when working normally, however hangs and/or disconnects as described on the problem release. (I don't mean that just this repo has issues, rather that it is easy to show the issue, e.g. with this repo.)

The 'sustained upload' with which I had problems was actually a large git commit, too. (I don't know if whatever git is doing is related to the 'sideband' error in one of the images above?)

mikebeaton avatar Nov 11 '23 19:11 mikebeaton

Thank you for the report!

[   94.049483]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.095533]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.103896]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.282187]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.287920]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.387648]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.563766]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 

[  102.998809]: itlwm: itlwm: Start UMAC Error Log Dump:
[  102.998811]: itlwm: itlwm: Status: 0x39, count: 7
[  102.998814]: itlwm: itlwm: 0x2010071F | ADVANCED_SYSASSERT

From the log, the driver firmware crash when doing large traffic, the most suspicious difference is 2637e9014b7af098974c8d6db69230f75fb20a90 which implement new rate scaling control algorithm(ported from linux iwlwifi driver), and enable STBC/LDPC/BEFER features, you can revert this commit and can confirm that. The crash happend after configured new rate, not during configuring new rate, and the error code 0x2010071F is only readable by Intel engineers, so I can‘t accurately locate the rate problem.

[  106.741138]: itlwm: LEGACY=FF0 SISO=7FD0 MIMO2=7FD0 VHT=1 LDPC=1 STBC=0 BFER=1
[  106.741141]: itlwm: MAX RATE: LEGACY=11 SISO=14 MIMO2=14

BFER=1 means Beamforming feature on the router side is ON, I haven't test if the driver is really satisified with that, can you try disable it? And can you try if the driver can work on 11n mode? currently it is on 11ac.

zxystd avatar Nov 13 '23 02:11 zxystd

Here's some initial new data, v2.3.0-DEBUG-alpha-ff1138b on Sonoma with 802.11n (obtained via -novht boot arg) gives the same problem. Here is a log:

Log_2023-11-13_17-54-11.log

I don't think I have control over beamforming in my router (it has a somewhat limited end-user interface ... the service is fast, at least ... ;-) ).

Also, I hope I haven't misunderstood, but in the pasted log in this stack exchange question https://askubuntu.com/q/1474713 there is 0x2010071F | ADVANCED_SYSASSERT following what seems to be a completely different error, in case this is relevant?

mikebeaton avatar Nov 13 '23 18:11 mikebeaton

To test beamforming, I used a different network. These all have BFER=0, the error still occurs.

Different WiFi (bridged through to the same original Wifi):

Log_2023-11-13_18-27-01.log

Phone hotspot (direct to mobile network, not to another WiFi):

1/2: Log_2023-11-13_18-44-55.log

2/2: Log_2023-11-13_18-49-13.log

Note that in 1/2 above, I observed the network disconnect and re-connect during the attempt to git clone, presumably indicating the some part of the error had already happened, but I guess I did not leave it running long enough before pressing CTRL+C then generating the log, and the log file does not contain 0x2010071F | ADVANCED_SYSASSERT (even though generated after seeing the WiFi disconnect then re-connect). (EDIT: Possibly this is a red herring, and the disconnect and reconnect is unrelated to the main issue here? Not sure. Log 2/2 definitely shows the issue.)

mikebeaton avatar Nov 13 '23 18:11 mikebeaton

Finally, I can confirm that reverting the merge commit 2637e90 as suggested does indeed resolve the issue (tested working on Monterey and Sonoma).

mikebeaton avatar Nov 14 '23 00:11 mikebeaton

If it adds anything, I wonder if there may be some kind of bug in how my home router handles ... something about the traffic which git is generating. From a simple internet speed test, my work network is very similar in speed to my new home network, but my work PC can clone git repos far faster. Of course, if this is correct and there is something faulty about the network traffic, it is still not crashing and causing disconnects except in this one version of your (very useful!) driver - so I guess this would still make it some kind of bug in your driver, being shown up by the (possibly?) faulty network traffic in this case. Let me know if I can run any more tests to help with this.

mikebeaton avatar Dec 04 '23 21:12 mikebeaton

One more thought, and I'll leave it in your hands, but I'm still struggling here to checkout submodules (very slow, git fairly often aborts the process) - I'm just wondering if maybe what is happening here is that there's a real error in the traffic, but that the new code in the itlwm driver is escalating it too far? (i.e. it's being treated like a really bad error which makes the Wi-Fi disconnect, but really it's only a 'normally' bad error)

mikebeaton avatar Dec 08 '23 22:12 mikebeaton

Yes same bug here. Browsing somehow works, but as long as you start anything slightly network-intensive (rsync, scp, ...) the transfer hangs very shortly.

LubosKolouch avatar Jan 30 '24 13:01 LubosKolouch

The problem persists with AirportItlwm-Sonoma-v2.3.0-DEBUG-alpha-3e1624d.zip

LubosKolouch avatar Feb 05 '24 17:02 LubosKolouch

The same problem in 14.4

LubosKolouch avatar Mar 09 '24 13:03 LubosKolouch

The same problem in 14.4

What's your wifi card model? and please upload the running log after drop and reconnect.

zxystd avatar Mar 09 '24 13:03 zxystd

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

LubosKolouch avatar Mar 09 '24 18:03 LubosKolouch

…ok, I cannot really.

During the stalled transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 Request timeout for icmp_seq 5 Request timeout for icmp_seq 6 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8015.897 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7011.728 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=6009.137 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=5014.392 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=4009.650 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=3007.820 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=2004.144 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=1003.956 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=3624.621 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=2622.468 ms 64 bytes from 142.251.36.132: icmp_seq=10 ttl=115 time=1619.776 ms 64 bytes from 142.251.36.132: icmp_seq=11 ttl=115 time=626.808 ms 64 bytes from 142.251.36.132: icmp_seq=12 ttl=115 time=477.406 ms 64 bytes from 142.251.36.132: icmp_seq=13 ttl=115 time=385.568 ms 64 bytes from 142.251.36.132: icmp_seq=14 ttl=115 time=7346.008 ms 64 bytes from 142.251.36.132: icmp_seq=15 ttl=115 time=6341.893 ms 64 bytes from 142.251.36.132: icmp_seq=16 ttl=115 time=5339.970 ms 64 bytes from 142.251.36.132: icmp_seq=17 ttl=115 time=4337.356 ms 64 bytes from 142.251.36.132: icmp_seq=18 ttl=115 time=3334.400 ms 64 bytes from 142.251.36.132: icmp_seq=19 ttl=115 time=2330.019 ms 64 bytes from 142.251.36.132: icmp_seq=20 ttl=115 time=1326.930 ms 64 bytes from 142.251.36.132: icmp_seq=21 ttl=115 time=329.333 ms 64 bytes from 142.251.36.132: icmp_seq=22 ttl=115 time=1773.373 ms

On 9. 3. 2024, at 19:48, Lubos Kolouch @.***> wrote:

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

LubosKolouch avatar Mar 09 '24 19:03 LubosKolouch

… when I kill the transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8.383 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7.790 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=10.877 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=8.629 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=41.523 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=9.450 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=8.463 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=7.175 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=7.982 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=8.705 ms

On 9. 3. 2024, at 20:10, kolcon @.***> wrote:

…ok, I cannot really.

During the stalled transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 Request timeout for icmp_seq 5 Request timeout for icmp_seq 6 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8015.897 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7011.728 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=6009.137 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=5014.392 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=4009.650 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=3007.820 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=2004.144 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=1003.956 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=3624.621 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=2622.468 ms 64 bytes from 142.251.36.132: icmp_seq=10 ttl=115 time=1619.776 ms 64 bytes from 142.251.36.132: icmp_seq=11 ttl=115 time=626.808 ms 64 bytes from 142.251.36.132: icmp_seq=12 ttl=115 time=477.406 ms 64 bytes from 142.251.36.132: icmp_seq=13 ttl=115 time=385.568 ms 64 bytes from 142.251.36.132: icmp_seq=14 ttl=115 time=7346.008 ms 64 bytes from 142.251.36.132: icmp_seq=15 ttl=115 time=6341.893 ms 64 bytes from 142.251.36.132: icmp_seq=16 ttl=115 time=5339.970 ms 64 bytes from 142.251.36.132: icmp_seq=17 ttl=115 time=4337.356 ms 64 bytes from 142.251.36.132: icmp_seq=18 ttl=115 time=3334.400 ms 64 bytes from 142.251.36.132: icmp_seq=19 ttl=115 time=2330.019 ms 64 bytes from 142.251.36.132: icmp_seq=20 ttl=115 time=1326.930 ms 64 bytes from 142.251.36.132: icmp_seq=21 ttl=115 time=329.333 ms 64 bytes from 142.251.36.132: icmp_seq=22 ttl=115 time=1773.373 ms

On 9. 3. 2024, at 19:48, Lubos Kolouch @.***> wrote:

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

LubosKolouch avatar Mar 09 '24 19:03 LubosKolouch

How to get the log?

https://openintelwireless.github.io/itlwm/Troubleshooting.html#runtime-logs

collect the log after the issue happened. @LubosKolouch

zxystd avatar Mar 10 '24 02:03 zxystd

Thanks, attached both for AirportItlwm and itlwm, same behavior.

On 10. 3. 2024, at 3:43, 钟先耀 @.***> wrote:

How to get the log?

https://openintelwireless.github.io/itlwm/Troubleshooting.html#runtime-logs

collect the log after the issue happened. @LubosKolouch https://github.com/LubosKolouch — Reply to this email directly, view it on GitHub https://github.com/OpenIntelWireless/itlwm/issues/934#issuecomment-1987048200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7UARYL6J73VNMVDBEIJLLYXPCF7AVCNFSM6AAAAAA7ED4FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGA2DQMRQGA. You are receiving this because you were mentioned.

LubosKolouch avatar Mar 10 '24 10:03 LubosKolouch

@LubosKolouch I don't see the files...

zxystd avatar Mar 11 '24 00:03 zxystd

looks like they did not go through as email attachments, reuploaded them...

LubosKolouch avatar Mar 11 '24 05:03 LubosKolouch

Problem is still present in AirportItlwm-Sonoma14.4-v2.3.0-DEBUG-alpha-6dc344d.zip

LubosKolouch avatar Mar 14 '24 16:03 LubosKolouch

@LubosKolouch Recent commits doesn't do any help on this issue, because I don't locate reason of why the firmware crash randomly on yours case.

I have added more debug info on this kext, please trigger the disconnection again and then collect the log, thanks. Sonoma14.4.zip

zxystd avatar Mar 14 '24 16:03 zxystd

attached... bugreport_38362.zip

LubosKolouch avatar Mar 14 '24 16:03 LubosKolouch

@LubosKolouch I don't see disconnection from the log, please collect the log after auto reconnect.

zxystd avatar Mar 14 '24 17:03 zxystd

@zxystd is this log better? If not, I will collect more... bugreport_26665.zip

LubosKolouch avatar Mar 14 '24 17:03 LubosKolouch

@LubosKolouch No disconnection or firmware crash there in the two logs, the driver is working fine all the 232 seconds. the only problem is that it is always doing roam scan to find a better WiFi, you are too far away from the router? scanning can affect the performance definitely.

If you reproduce the disconnect and reconnect issue, please collect the log and upload here.

zxystd avatar Mar 14 '24 17:03 zxystd

@zxystd I hear what you say, I try with one more log. The pings are falling out, jumping to 4000ms and the transfers stops after transferring 255kB.

Some more comments:

  1. other machines on Linux (next to this one) are working smoothly, no problem with wifi/transfers
  2. if I reboot THIS machine to Linux, also wifi working fine
  3. it was working fine with previous Airportilwm releases up to approximately November. For example with the Prelease05 I never had any issues.

bugreport_23996.zip

LubosKolouch avatar Mar 14 '24 17:03 LubosKolouch

  1. When I use wired connection, the transfers work ok => it is not issue of my scp, rsync or some general network stack

LubosKolouch avatar Mar 14 '24 20:03 LubosKolouch

  1. it was working fine with previous Airportilwm releases up to approximately November. For example with the Prelease05 I never had any issues.

@LubosKolouch I believe what you said, so we should find out why it is stalled. here I add more log and reset the aggregation trigger condition to preview05, please check if it would help? Sonoma14.4.zip

zxystd avatar Mar 15 '24 00:03 zxystd