WiFi connect fails permanently after reboot (IDFGH-12600)
Answers checklist.
- [X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
- [X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
- [X] I have searched the issue tracker for a similar issue and not found a similar issue.
IDF version.
v5.2.1
Espressif SoC revision.
ESP32
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
esp32-wroom-32
Power Supply used.
USB
What is the expected behavior?
WiFi Connect always works - even if device was reset with reset esp_restart() or esp_system_abort("") many times.
What is the actual behavior?
As reported in https://github.com/espressif/esp-idf/issues/11060 WiFi stack can hang in some state that make WiFi connection not possible. I've managed to reproduce the problem using bleprph_wifi_coex example with the following modifications:
- Turned on BLE scanning
- CONFIG_ESP_WIFI_TASK_PINNED_TO_CORE_1=y
- Add esp_system_abort 15 seconds after WiFi connection is established.
Points 1 & 2 are necessary. In the real system point 3 happens by chance.
After a few hours or sometimes after a night the device is not able to connect to WiFi any more. WiFi reconfiguration yields always in such cases: sw txq[0] state(1) is not idle, potential error!.
Steps to reproduce.
Reset the device with esp_restart() or esp_system_abort("") many times or just wait until WiFi Connect will hang by chance.
Debug Logs.
W None | [0;32mI (8878) wifi_prph_coex: retry to connect to the AP [0m
W None | [0;32mI (8878) wifi_prph_coex: connect to the AP fail [0m
I 8888 | wifi new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I 8898 | wifi state: init -> auth (b0)
I 9898 | wifi state: auth -> init (200)
I 9898 | wifi new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W None | [0;32mI (9898) wifi_prph_coex: retry to connect to the AP [0m
W None | [0;32mI (9898) wifi_prph_coex: connect to the AP fail [0m
W 13288 | wifi m f probe req l=0
W None | [0;32mI (13288) wifi_prph_coex: retry to connect to the AP [0m
W None | [0;32mI (13288) wifi_prph_coex: connect to the AP fail [0m
I 13298 | wifi new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I 13298 | wifi state: init -> auth (b0)
I 14298 | wifi state: auth -> init (200)
I 14298 | wifi new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
W None | [0;32mI (14298) wifi_prph_coex: Retries failed. Reconfiguring wifi [0m
E 14308 | wifi NAN WiFi stop
W 19308 | wifi TX Q not empty: 500, TXQ_BLOCK=17ff
W 19308 | wifi force witi stop
I 19308 | wifi flush txq
I 19308 | wifi stop sw txq
I 19308 | wifi lmac stop hw txq
W 19308 | wifi sw txq[0] state(1) is not idle, potential error!
I 19318 | wifi mode : sta (c8:f0:9e:4e:10:fc)
I 19318 | wifi enable tsf
W None | [0;32mI (19318) wifi_prph_coex: wifi_configure finished. [0m
W None | [0;32mI (19328) wifi_prph_coex: connect to the AP fail [0m
More Information.
Functions esp_restart(); or esp_system_abort(""); don't help. Power cycle, reset with the button and hard_reset with esp-tool.py always help.
Disabling CONFIG_ESP_PHY_CALIBRATION_AND_DATA_STORAGE or enabling CONFIG_ESP_PHY_RF_CAL_FULL doesn't change the behaviour. The same issue happens with bluedroid. Issue exists on v4.2.4, v5.1.2 and v5.2.1. Example code & logs are attached. I've removed ping feature from the code to make it clearer.
Hi @tomasznowik , thanks for your report and project!
We are able to reproduce your issue here and looking into you, will keep you updated ASAP.
Hi @tomasznowik , could you pls help double-check if the issue exists on v5.2.1 when you calling esp_restart instead of esp_system_abort?
I saw you mentioned that it didnot help, while in my place, issue could definitely if esp_system_abort is called while issue gone if
esp_restart was called instead. Ive tested for 3 days and everything seems OK with esp_restart.
Actually, things are different for these two APIs. esp_wifi_stop is called when calling esp_restart, which is esstential for safe reboot.
Hi @Espressif-liuuuu , I started test yesterday evening and so far it looks good. We used esp_restart on v4.2.4 before switching to esp_system_abort for this very reason.
But note that due to unknown bugs or issues in user code or framework an abort may happen anyway from time to time. Is there any way to recovery from this error state?
Hi @Espressif-liuuuu I confirm that calling stop_wifi before esp_system_abort prevents wifi issues in long term.
But please provide workaround in case abort or hard fault happens and it hangs wifi.
I found that calling ble_gap_disc to start BLE scanning when wifi is hung sometimes (or after some number of tries) makes wifi work again.
Hi @Espressif-liuuuu I confirm that calling
stop_wifibeforeesp_system_abortprevents wifi issues in long term. But please provide workaround in case abort or hard fault happens and it hangs wifi.I found that calling
ble_gap_discto start BLE scanning when wifi is hung sometimes (or after some number of tries) makes wifi work again.
Yes for sure, we are focusing on if some registers were not reset in that case.
Hi @tomasznowik , thanks for your report and project! We are able to reproduce your issue here and looking into you, will keep you updated ASAP.
@Espressif-liuuuu Any update about this issue?
Hi, not yet. Its still in test & discussion. We will keep it updated.
Hi @tomasznowik , we finally find the root cause of the issue, here is the result
There are several essential conditions to triggle the issue
- SW_CPU_RESET. Any reset including digital reset wont lead to the issue
- BLE must be in scan before reset and must NOT be in scan after reset. If BLE scan starts immediately instead of after Wi-Fi connected, the issue gone as well.
- Only ESP32
The root cause is that, there are pair of digital IO operations during the coexist switching when Wi-Fi coexisting with BLE scan. When issue happens, the first operation is done, without the second operation (restore) executed, software reset. After reset, there is no more chance to execute the second operation, leading to Wi-Fi Tx blocked.
To verify the fix of the issue, based on v5.2.1, you can try to replace the libs in fw_update.zip to IDF. There shall be no issue if SW_CPU_RESET with these libs. You need to
- put fw_update/esp_wifi/esp32 to $IDF_PATH/components/esp_wifi/lib
- put fw_update/esp_coex/esp32 to $IDF_PATH/components/esp_coex/lib
- idf.py fullclean
- rebuild and test
Please check the initialization log and search below fw information to make sure its updated correctly wifi firmware version: 51e0778dc coex firmware version: 3b39fc607
Furthermore, we will merge this fix to master-v5.0.
Furthermore, we will merge this fix to master-v5.0.
@Espressif-liuuuu The fix is not yet available in v5.0 ~ v5.2.
Furthermore, we will merge this fix to master-v5.0.
@Espressif-liuuuu The fix is not yet available in v5.0 ~ v5.2.
v5.2: 36e0c4898
v5.1: d049d69a
v5.0: a2434a844
Sorry for the miss in commit information, its described in release note.