firmware
firmware copied to clipboard
Suboptimal uplink performance with weaker signal links
This issue is to note an observation seen while testing another issue, and possibly highlights that there is an issue (or at the least, an optimization opportunity) with the WILC1000 module.
Using iperf to send traffic from a WILC1000 (station mode) to a distant TPLink AP, on the air packet captures show a large amount of re-transmissions. Further investigation has shown the the WILC module appears to consistently tx the large QOS-Data frames at an initial rate that is much higher data rate than the weak link is capable of. Most often, these packets are re-transmitted 4 times before successfully being ack'ed, each at a successively lower rate. This results in much lower wilc - > AP performance than would otherwise be seen if the initial tx was closer to the rate the link is actually capable of.
Some anecdotal examples:- On the same link:
- iperf from AP-> WILC shows initial tx and consistent first packet ack at 39Mbps rate
- iperf from WILC ->AP shows packets sent most frequently at 65 initially, then retries at 58, 52, and 39.
With a weaker link, the WILC will transmit using mid g rates, but the same sequence occurs, with the packet being re-transmitted several times. Comparing to the AP-> WILC iperf the AP again sends a much higher number of packets without re-transmit (and at a lower initial rate)
Comparing the data throughput, for the same link conditions, AP->wilc iperf throughput is much higher than the wilc->AP throughput.
We believe that if the WILC module was more accurately maintaining the current rate the link is capable of, and re-transmitted fewer packets, then the difference between up and down link speeds would be less. It appears that the automatic rate control in the WILC100 module needs to be investigated.
During these tests, the RF environment was kept as quiet as possible, however other WIFI AP's are operating on the same channel, and some other WILC stations were connected to this AP, however no active data transmission was taking place with those other modules. (This is unlikely to be a contention, or packet overlap issue, causing the re-transmissions)
Using the WILC module in AP mode has not been investigated to see if the same behaviour occurs in this reversed configuration, but this would be an interesting experiment.
While this is not a critical issue (the packets do successfully get through after several retries), it would be good to get some feedback on these observations, and have some Microchip engineering investigations performed to see if improvements can be made to how the WILC module handles selecting the initial rate used for tx.
It should also be noted that as a result of this behaviour, the effective throughput difference between modules that are located very close the the AP, compared to those located much further away is much more that it should be, after taking into account the lower data rate needed for the distant link. This difference was easily noticeable, which was the trigger for this deeper investigation.
I have reproduced this with the combinations I posted here: https://github.com/linux4wilc/firmware/issues/6#issuecomment-472846718
In both cases, the clients still show a good signal quality but data transfer comes to a halt at a distance of ~10-15m. There are no bandwidth issues when standing right next to the AP.
Unfortunately we want to have the client devices to be able to maitain a HD video stream for telemedicine applications, where we cannot guarantee immediate proximity to the AP.
I'll have to make a proper setup for that to be able to quantify the difference, because the difference is quite hard to tell with well over 50 access points active nearby.
+1 month since I opened this issue. @AdhamAbozaeid can I get a your input on this. Should I open a Salesforce ticket on this one? Are you aware of any issue with autorate / rate control / speed with weaker links?
Yes please open a salesforce ticket. Salesforce is microchip's official bug report interface where issues get prioritized and planned for support.
OK. I will do that today. Has anything been seen like this in testing so far? @Mateusz-Gwara 's observations mirror mine. My current workaround is to move to AP close.
Salesforce case number 00404542
+2 months since I opened this issue. No action in Salesforce. @AdhamAbozaeid can I get an update?
@tsifb , please follow up with the team on Salesforce
@AdhamAbozaeid done. I think the pcap file attached to the last saleforce response is not the correct one.... I couldn't match the packets to the description in the text.
I also added some more clarification comments from my own test observations, and also some comments based on a brief look at the pcap file that was attached.
@Mateusz-Gwara update on this for you: We continue to work with MCHP via salesforce on this issue, however last from them was to refer us to our local Sales/FAE team to request a modification to the way the WILC module performs the data rate control as a new feature. (rather than considering this as a maintenance item or bug fix)
My current understanding is that the WILC (station) will use an initial TX data rate for any frame based on the last RSSI from the AP it is connected to. When the AP is not another WILC module, and if the TX power of the AP is much more than the WILC, then I believe this method is flawed. The WILC will always choose a rate that is too high, and frames will need to be transmitted.
My understanding of how the WILC firmware operates may still be incorrect, however it is clear that there is a high level of reluctance from MCHP to look at this in more depth, or to make any changes to reduce the higher-than-necessary number of re-transmissions that are easily observable in our setup.
thanks for the update @tsifb
My problem with the behaviour of the WILC driver is that it eventually leads to a system freeze (requiring a HW-watchdog to get out of). Any application using the linux network stack seems to hang then (ppp, hostapd even a simple ping or ifconfig hangs).
Sometimes if we catch the errors produced in the syslog: [ 6588.250412] WILC_SPI spi0.2: Failed cmd response, cmd (c4), resp (00) [ 6588.257161] WILC_SPI spi0.2: Failed cmd, read reg (00000001)... [ 6588.269069] WILC_SPI spi0.2: Failed cmd response, cmd (cf), resp (00) [ 6588.275817] WILC_SPI spi0.2: Failed cmd reset [ 6588.280476] WILC_SPI spi0.2: Reset and retry 1 1 [ 6588.295384] WILC_SPI spi0.2 wlan0: ERR [wilc_wlan_initialize:905] Initializing WILC_Wlan FAILED [ 6588.304600] WILC_SPI spi0.2 wlan0: ERR [wilc_mac_open:1029] Failed to initialize wilc [ 6588.312867] WILC_SPI spi0.2 wlan0: WRN [debug_thread: 93]Couldn't restart ifc 1 [ 6711.094427] WILC_SPI spi0.2 wlan0: INFO [get_station]Getting station parameters
there is a chance to get out of it by rmmod wilc-spi and reloading everything again...
I also opened a new salesforce ticket adressing this issue: 00473411
^^This can be reproduced with any driver version (master or latest dev) from: https://github.com/linux4wilc/driver
We also tested on debian 7 (and 4.19 kernel) as well as on a completely clean debian 10 (5.3.7 kernel) bootstrap install with nothing besides a hostapd running.
Simply make an AP (WPA enabled) running on wilc as described in: http://ww1.microchip.com/downloads/en/DeviceDoc/ATWILC1000-ATWILC3000-Wi-Fi-Link-Controller-Linux-User-Guide-DS70005328C.pdf
Connect any kind of client to it (like a tablet for instance) and stream for a couple of hours.
The error posted above will come up and eventually lead to a system freeze.
Hi @Mateusz-Gwara Can you try disabling powersave and see if you are still geting SPI errors?
@Mateusz-Gwara Can you also try the attached patch with powersave enabled and attach the logs? I'm not able to see the same problem on my host, but I'm suspecting that this might fix the issue. 0001-staging-wilc1000-avoid-reading-registers-before-waki.zip
Hello @AdhamAbozaeid !
Thank you for your fast response and effort!
We do not use powersave for a while now.
Your patch has a typo in line 38:
pr_err("Bus error %d while writing %x\n" ret, from_host_to_fw_reg);
should be:
pr_err("Bus error %d while writing %x\n", ret, from_host_to_fw_reg);
^^missing a comma before "ret"
I've still encountered a freeze after issuing an ifconfig command (with cpu load going up until the hw-watchdog triggers a reboot).
For now I haven't seen the error message posted above.
This is what happens after a client streams a youtube video for ~2h:
[ 6538.089913] WILC_SPI spi0.2 wlan0: INFO [get_station]Getting station parameters [ 6538.125324] WILC_SPI spi0.2 wlan0: INFO [wilc_get_inactive_time]Getting inactive time : 0 [ 6538.125353] WILC_SPI spi0.2 wlan0: INFO [get_station]Inactive time 0 [ 6538.741147] WILC_SPI spi0.2 wlan0: INFO [get_station]Getting station parameters [ 6538.771992] WILC_SPI spi0.2 wlan0: INFO [wilc_get_inactive_time]Getting inactive time : 0 [ 6538.772019] WILC_SPI spi0.2 wlan0: INFO [get_station]Inactive time 0 [ 6567.007092] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.007161] WILC_SPI spi0.2: Failed block data write... [ 6567.011321] WILC_SPI spi0.2: Reset and retry 10 0 8192 [ 6567.082891] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.082915] WILC_SPI spi0.2: Failed block data write... [ 6567.084222] WILC_SPI spi0.2: Reset and retry 9 0 8192 [ 6567.123488] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.123511] WILC_SPI spi0.2: Failed block data write... [ 6567.124811] WILC_SPI spi0.2: Reset and retry 8 0 8192 [ 6567.160072] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.160094] WILC_SPI spi0.2: Failed block data write... [ 6567.161527] WILC_SPI spi0.2: Reset and retry 7 0 8192 [ 6567.197192] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.197213] WILC_SPI spi0.2: Failed block data write... [ 6567.199003] WILC_SPI spi0.2: Reset and retry 6 0 8192 [ 6567.234751] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.234773] WILC_SPI spi0.2: Failed block data write... [ 6567.236176] WILC_SPI spi0.2: Reset and retry 5 0 8192 [ 6567.270106] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.270125] WILC_SPI spi0.2: Failed block data write... [ 6567.271437] WILC_SPI spi0.2: Reset and retry 4 0 8192 [ 6567.306098] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.306119] WILC_SPI spi0.2: Failed block data write... [ 6567.307423] WILC_SPI spi0.2: Reset and retry 3 0 8192 [ 6567.344211] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.344233] WILC_SPI spi0.2: Failed block data write... [ 6567.345525] WILC_SPI spi0.2: Reset and retry 2 0 8192 [ 6567.379555] WILC_SPI spi0.2: Failed data response read, c3 0 0 [ 6567.379574] WILC_SPI spi0.2: Failed block data write... [ 6567.380876] WILC_SPI spi0.2: Reset and retry 1 0 8192 [ 6567.382035] WILC_SPI spi0.2 wlan0: ERR [wilc_wlan_handle_txq:1249] fail block tx ext... [ 6567.383307] wilc_wlan_handle_rx_buff: Data corrupted 0, 0 [ 6567.384376] wilc_wlan_handle_rx_buff: Data corrupted 0, 0
All checks we do to see if our AP is still up do NOT fail but the wifi signal is gone:
sub check_ifconfig_wlan {
my $db_handler = shift;
my $response1 = `ifconfig | grep mon.wlan0`;
my $response2 = `ifconfig wlan0 | grep addr`;
my $response3 = `service hostapd status`;
my $response4 = `netstat -tulpen | grep dnsmasq | wc -l`;
my $response5 = `netstat -tulpen | grep busybox | wc -l`;
my $response6 = `dmesg | grep 'WILC_SPI spi0.2: Failed cmd, read reg' | wc -l`;
my $response7 = `cat /sys/class/net/wlan0/operstate`;
chomp($response7);
print "r7: ".$response7;
if ((!$response1)||(!$response2)||(!$response3)||($response4<=0)||($response5<=0)||($response6>=1)||!($response7 eq "up") || ( -e "/tablet_connection_lost" )) {
Hi @Mateusz-Gwara
This looks different from the previous log where it was failing to read register 1
[ 6588.257161] WILC_SPI spi0.2: Failed cmd, read reg (00000001).
Was this error happening when you stream youtube video as well? or was it another test case ( or another step of same test), and it doesn't happen after you apply the patch?
@AdhamAbozaeid I've tested this while running an youtube stream from an client connected to a WILC3000 AP running the latest dev branch driver&firmware with your patch posted here applied.
All you have to do is to start a hostapd running the latest patched driver/fw combo and have a client stream for a couple of hours
Like I said, the main problem is that no system process seems to be affected and keeps on running, but clients lose their connection and the wifi ssid disappears.
Hi,
I was hoping, that the 15.3 release might solve some of the known issues, as they are mentioned as fixed in the changelog.
I am still experiencing the same throughput issues as before and the same instability after having downloaded something over a longer period of time.
I'm running:
root@casenio:~# ls -al /lib/firmware/mchp/
total 404
drwxr-xr-x 2 root root 4096 Dec 9 13:11 .
drwxr-xr-x 5 root root 4096 Sep 16 12:34 ..
-rwxr-xr-x 1 root root 2356 Dec 2 12:15 LICENSE.wilc_fw
-rwxr-xr-x 1 root root 1822 Dec 2 12:15 README.md
-rwxr-xr-x 1 root root 135324 Dec 2 12:15 wilc1000_wifi_firmware.bin
-rwxr-xr-x 1 root root 58700 Dec 2 12:15 wilc3000_ble_firmware.bin
-rwxr-xr-x 1 root root 57232 Dec 2 12:15 wilc3000_ble_firmware_no_rtc.bin
-rwxr-xr-x 1 root root 136336 Dec 2 12:15 wilc3000_wifi_firmware.bin
root@89490200001095385996:~# modinfo wilc-spi
filename: /lib/modules/4.19.38casenio-g4dae378bbe/kernel/drivers/staging/wilc/wilc-spi.ko
license: GPL
version: 15.3
license: GPL
srcversion: B03D1C84BC285004DA5844A
alias: of:N*T*Cmicrochip,wilc3000C*
alias: of:N*T*Cmicrochip,wilc3000
alias: of:N*T*Cmicrochip,wilc1000C*
alias: of:N*T*Cmicrochip,wilc1000
depends: cfg80211
staging: Y
intree: Y
name: wilc_spi
vermagic: 4.19.38casenio-g4dae378bbe mod_unload modversions ARMv6 p2v8
My salesforce ticket still didn't receive any response since over a month now, that's why I'm asking here if there is any effort being made to make further improvements to this hardware(-driver)?
Besides that I also observed the following error:
[76324.698266] TCP: wlan0: Driver has suspect GRO implementation, TCP performance may be compromised.
What i found here https://bugzilla.kernel.org/show_bug.cgi?id=194569 and here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dcb17d22e1c2cd72e72190c736349a675362b3bc
sounds like it might be an effect in the 4.19 kernel?
@Mateusz-Gwara , Please explain more on what was mentioned as fixed in the change log, but wasn't actually fixed.
@Mateusz-Gwara I think you should create another GitHub issue for your instability problems.
Microchip have indicated they won't fix their flawed method of determining transmit power without a business case as they are "mature" products even though until relatively recently there have been significant fundamental issues in both driver and firmware. They claim they ship millions of units per year - hard to believe with all these issues.
For anyone following, Microchip have refused to fix this issue saying because they are focusing on new development and it's in the "backlog" whatever that means. i.e. I'd consider this product semi-abandoned and I am not confident it will ever be fixed.
@ShonkyCH & @tsifb Thanks again for keeping everybody up to date here! Unfortunately I've experienced the same thing with the Salesforce support either not responding at all or just turning the whole thing around in circles without investigating themselves.
The support staff was either unable to help us as they stated that they do not have a setup available to reproduce our error but in another Ticket they responded with this:
Created by : Rakila_parveen Mohamed_ismail (1/22/2020, 13:37) Hi XXXX,
We are sorry for the inconvenience cause due to delay in response as we had caught up with our issues. ATWILC3000 is not an EOL, we are supporting it and if there ia any bug we will report it to the development team. We are sending continuous ping command to ATWILC3000 using below command.
ping 192.168.43.233 -t -l 1200
We are able to see below error after some time and ATWILC3000 gets disconnected from AP. Is this the same issue seen in your side? Does the above payload length sufficient to reproduce the issue?
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1886] Get Timed out
wilc_sdio mmc0:0001:1 wlan0: ERR [handle_get_statistics:2231] Failed to send scan parameters
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1: clocks still OFF. Retrying
Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = c0004000
[00000004] *pgd=00000000
Internal error: Oops: 17 [#1] ARM
Modules linked in: wilc_sdio(C)
CPU: 0 PID: 768 Comm: K_TXQ_TASK Tainted: G C 3.18.0-linux4sam_5.0-alpha7 #2
task: de51fb80 ti: ddd9c000 task.ti: ddd9c000
PC is at wilc_wlan_txq_remove_from_head+0x30/0x84 [wilc_sdio]
LR is at wilc_wlan_handle_txq+0xaf0/0xcd4 [wilc_sdio]
pc : [
] lr : [ ] psr: 000b0093 sp : ddd9dd64 ip : 800b0013 fp : bf0201f0
r10: ddd9ddf7 r9 : ddd9ddd9 r8 : ddd59044
r7 : 00000060 r6 : 00000060 r5 : ddd00000 r4 : 000000a9
r3 : ddd5ad0c r2 : ddd59000 r1 : ddd5b380 r0 : 00000000
Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c53c7d Table: 3dcb0059 DAC: 00000015
Process K_TXQ_TASK (pid: 768, stack limit = 0xddd9c230)
Stack: (0xddd9dd64 to 0xddd9e000)
dd60: 000000a9 ddd00000 bf0132f4 00000100 c00398fc ddd9de18 ddd1c000
dd80: bf018634 ddd59724 00000003 bf01e020 bf01e04c bf01e0d0 ddd59000 ddd59674
dda0: ddd5902c ddd9df44 ddd00000 bf0204b8 00000100 80090060 00000002 57005757
ddc0: 01010101 00000001 00000000 00000000 00000000 00000000 c069a900 c06c5700
dde0: 0000000a c04bb9f8 c05aadac fffff904 c069add0 00208040 de51fb80 c06a0878
de00: 00000000 00000000 00000001 de402200 c06a0878 00000000 00000418 00000000
de20: de402200 c0040868 c0034520 600f0013 ffffffff ddd9de74 c04bbd70 c0011740
de40: ddc474c0 00000002 00000001 c069a498 c069a498 c069a498 ddd9c000 c04bbd70
de60: ffffffff c068c000 00000002 c0031830 c068c01c de51fb80 00000001 c069a498
de80: c069a498 c069a498 ddd9c000 c04bbd70 ffffffff c068c000 00000002 c0031830
dea0: c068c01c de51fb80 c069a498 537e0b22 c04bbd70 c0031900 c0696bf0 c069a498
dec0: ddd9defc c04b6fd4 ddd59074 c05ac020 c06c4b73 00000000 ddd9c000 00000f6d
dee0: c069add0 c069adc8 a00b0013 ddd9df04 c06d72c0 ddd59068 ddd59074 c04b9094
df00: 00000000 00000000 00200200 00000f6d c06d72c0 c0047ed8 de51fb80 ddd59000
df20: fffffffe ddd1c000 ddd9c000 bf0201f0 bf0177fc ddd59068 ddd59074 bf005fb0
df40: c06c4b73 00000001 00000000 00000000 de556f80 ddd1c000 bf005ef8 00000000
df60: 00000000 00000000 00000000 c0030ee4 ddcea080 00000000 ddcea0b4 ddd1c000
df80: 00000000 ddd9df84 ddd9df84 00000000 ddd9df90 ddd9df90 ddd9dfac de556f80
dfa0: c0030e1c 00000000 00000000 c000e2d8 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[
] (wilc_wlan_txq_remove_from_head [wilc_sdio]) from [ ] (wilc_wlan_handle_txq+0xaf0/0xcd4 [wilc_sdio]) [
] (wilc_wlan_handle_txq [wilc_sdio]) from [ ] (linux_wlan_txq_task+0xb8/0x1d4 [wilc_sdio]) [
] (linux_wlan_txq_task [wilc_sdio]) from [ ] (kthread+0xc8/0xe0) [
] (kthread) from [ ] (ret_from_fork+0x14/0x3c) Code: e5930674 e2811004 e1500001 0a000012 (e590e004)
---[ end trace 5add3734d3013de0 ]---
wilc_sdio mmc0:0001:1 wlan0: ERR [wilc_send_config_pkt:1886] Get Timed out
wilc_sdio mmc0:0001:1 wlan0: ERR [handle_get_statistics:2231] Failed to send scan parameters
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
wilc_sdio mmc0:0001:1 wlan0: INFO [debug_thread]*** Debug Thread Running ***
Please share your kernel debug log while issue is happening.
If we are able to see the same issue in our hardware, then we might not need your setup. We can get your board once the issue is not reproduceable in our side.
Thanks,
XXXX
I can't understand that they didn't try to solve an issue they experienced themselves caused by a simple ping...
Right now we've moved to another manufacturer as the progress here seems to have stalled completely.
Our problem remains hundereds of customer installations which cannot withstand a couple of minutes of video streaming resulting in stuff like this out of nowhere:
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.416292] Internal error: Oops: 80000007 [#1] ARM
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.528910] Process WILC_DEBUG (pid: 4422, stack limit = 0x6c999b50)
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.535550] Stack: (0xdd0fff30 to 0xdd100000)
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.540099] ff20: 00000000 00000000 bf1ba108 00000000
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.548634] ff40: 00000001 d1780668 d172f5a0 d1780670 00000001 d17806c8 d17808ac d1780640
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.557168] ff60: ffffe000 c7cba100 dd0fdc40 00000000 dd0fe000 d1780640 bf1ba0a4 dd257cdc
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.565702] ff80: c7cba138 c0148e2c dd1f23c0 dd0fdc40 c0148d00 00000000 00000000 00000000
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.574234] ffa0: 00000000 00000000 00000000 c01010b4 00000000 00000000 00000000 00000000
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.582760] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.591293] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
Message from syslogd@89490200001132907935 at Sep 2 12:20:45 ...
kernel:[ 1692.607105] Code: bad PC value
or this:
[27075.911131] ================================
[27075.915589] WARNING: inconsistent lock state
[27075.920044] 4.19.38casenio-g4dae378bbe #1 Tainted: G WC
[27075.926762] --------------------------------
[27075.931211] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[27075.937479] K_TXQ_TASK/28468 [HC0[0]:SC0[0]:HE1:SE1] takes:
[27075.943293] d3054715 (&(&wl->txq_spinlock)->rlock){+.?.}, at: wilc_wlan_handle_txq+0x154/0xdc4 [wilc_spi]
[27075.953336] {IN-SOFTIRQ-W} state was registered at:
[27075.958422] _raw_spin_lock_irqsave+0x38/0x4c
[27075.963090] txq_add_net_pkt+0xf4/0x6c8 [wilc_spi]
[27075.968213] wilc_mac_xmit+0xb8/0x1e8 [wilc_spi]
[27075.973125] dev_hard_start_xmit+0xb8/0x268
[27075.977600] sch_direct_xmit+0xc8/0x294
[27075.981688] __qdisc_run+0x164/0x5c8
[27075.985509] __dev_queue_xmit+0x5a8/0x9a8
[27075.989783] neigh_update+0x3d0/0x75c
[27075.993690] arp_process+0x304/0x950
[27075.997508] arp_rcv+0x158/0x230
[27076.000964] __netif_receive_skb_one_core+0x40/0x48
[27076.006145] process_backlog+0x90/0x144
[27076.010227] net_rx_action+0xc8/0x360
[27076.014135] __do_softirq+0xd0/0x414
[27076.017953] irq_exit+0x178/0x1a4
[27076.021509] __handle_domain_irq+0x50/0xa8
[27076.025869] __irq_svc+0x70/0x98
[27076.029336] arch_cpu_idle+0x20/0x3c
[27076.033153] arch_cpu_idle+0x20/0x3c
[27076.036977] do_idle+0x110/0x1d8
[27076.040432] cpu_startup_entry+0xc/0x10
[27076.044532] start_kernel+0x378/0x408
[27076.048441] irq event stamp: 5909404
[27076.052172] hardirqs last enabled at (5909401): [<c0675d60>] _raw_spin_unlock_irq+0x24/0x2c
[27076.060984] hardirqs last disabled at (5909402): [<c0675bd8>] _raw_spin_lock_irqsave+0x1c/0x4c
[27076.070046] softirqs last enabled at (5909404): [<bf279ad4>] destroy_conntrack+0x40/0xd8 [nf_conntrack]
[27076.080007] softirqs last disabled at (5909403): [<bf279ad4>] destroy_conntrack+0x40/0xd8 [nf_conntrack]
[27076.089907]
[27076.089907] other info that might help us debug this:
[27076.096718] Possible unsafe locking scenario:
[27076.096718]
[27076.102887] CPU0
[27076.105430] ----
[27076.107983] lock(&(&wl->txq_spinlock)->rlock);
[27076.112804] <Interrupt>
[27076.115531] lock(&(&wl->txq_spinlock)->rlock);
[27076.120528]
[27076.120528] *** DEADLOCK ***
[27076.120528]
[27076.126710] 4 locks held by K_TXQ_TASK/28468:
[27076.131249] #0: 8cd22346 (&wl->txq_add_to_head_cs){+.+.}, at: wilc_wlan_handle_txq+0xc4/0xdc4 [wilc_spi]
[27076.141296] #1: cc0d6c0c (&wl->srcu){....}, at: wilc_wlan_handle_txq+0x5c/0xdc4 [wilc_spi]
[27076.150070] #2: d3054715 (&(&wl->txq_spinlock)->rlock){+.?.}, at: wilc_wlan_handle_txq+0x154/0xdc4 [wilc_spi]
[27076.160562] #3: 474fd12d (rcu_read_lock){....}, at: nf_conntrack_destroy+0x0/0xd0
[27076.168481]
[27076.168481] stack backtrace:
[27076.173027] CPU: 0 PID: 28468 Comm: K_TXQ_TASK Tainted: G WC 4.19.38casenio-g4dae378bbe #1
[27076.182922] Hardware name: Generic AM33XX (Flattened Device Tree)
[27076.189287] [<c010e438>] (unwind_backtrace) from [<c010c130>] (show_stack+0x10/0x14)
[27076.197366] [<c010c130>] (show_stack) from [<c0166048>] (mark_lock+0x5f0/0x6c4)
[27076.204994] [<c0166048>] (mark_lock) from [<c0166168>] (mark_held_locks+0x4c/0x78)
[27076.212896] [<c0166168>] (mark_held_locks) from [<c0168588>] (lockdep_hardirqs_on+0xfc/0x1e8)
[27076.221790] [<c0168588>] (lockdep_hardirqs_on) from [<c012e8ac>] (__local_bh_enable_ip+0xa4/0x16c)
[27076.231207] [<c012e8ac>] (__local_bh_enable_ip) from [<bf279afc>] (destroy_conntrack+0x68/0xd8 [nf_conntrack])
[27076.241712] [<bf279afc>] (destroy_conntrack [nf_conntrack]) from [<c05c530c>] (nf_conntrack_destroy+0x68/0xd0)
[27076.252162] [<c05c530c>] (nf_conntrack_destroy) from [<c05749e0>] (skb_release_all+0xc/0x24)
[27076.260971] [<c05749e0>] (skb_release_all) from [<c0575154>] (consume_skb+0x50/0xd4)
[27076.269100] [<c0575154>] (consume_skb) from [<bf36a018>] (wilc_tx_complete+0x30/0xa8 [wilc_spi])
[27076.278344] [<bf36a018>] (wilc_tx_complete [wilc_spi]) from [<bf374060>] (wilc_wlan_handle_txq+0x1f8/0xdc4 [wilc_spi])
[27076.289588] [<bf374060>] (wilc_wlan_handle_txq [wilc_spi]) from [<bf36a828>] (wilc_txq_task+0x90/0x268 [wilc_spi])
[27076.300416] [<bf36a828>] (wilc_txq_task [wilc_spi]) from [<c0148e2c>] (kthread+0x12c/0x144)
[27076.309137] [<c0148e2c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
[27076.316675] Exception stack(0xdd201fb0 to 0xdd201ff8)
[27076.321944] 1fa0: 00000000 00000000 00000000 00000000
[27076.330479] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[27076.339016] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[27222.914144] sh (30511): drop_caches: 3
@Mateusz-Gwara I have had nothing further on this particular original issue (expected because they said they would not fix this)
I have not seen anything similar to your crash, however we don't typically run WILC as an AP and stream for long periods. Up until now, the WILC is configured as a STA in our typical use case, and while we load large files (~100mb) it is infrequent. Otherwise, it is ssh and https API access.
I am currently experimenting with setting up WILC APs and attempting to work around all the WILC limitations to establish a tree topology network, although I would prefer to be able to connect ~50 WILC STA to a single WILC AP. (only ~7 seem to be supported) I will be looking for any similar crash to the one you posted here.
Is there a driver issue open for your crash? It might be worth opening one to track this separately; it is different to the original issue reported in this one.