esp32-wifi-manager icon indicating copy to clipboard operation
esp32-wifi-manager copied to clipboard

crash on connect

Open perigoso opened this issue 3 years ago • 15 comments

Prerequisites

  • [x] I have written a descriptive issue title
  • [x] I have verified that I am using the latest version of esp32-wifi-manager
  • [x] I have searched open and closed issues to ensure it has not already been reported

Description

I'm not sure if this is a bug, or just something i did wrong.

When the esp boots it goes to the wi-fi manager just fine, but when i try to add a connection it seems to crash, the culprit for the error seems to be esp_wifi_scan_start(&scan_config, false) at line 1020 (in master branch, 981 on the commit the project is using, but i have tried both versions without success), but I am not sure what to make from it.

Steps to Reproduce

  • clone esp32-nixie-clock
  • compile with latest master branch of esp-idf (ESP-IDF v4.3-dev-1197-g8bc19ba89)
  • erase flash and flash esp
  • connect to AP
  • add new connection

System Configuration

I am trying to use the code from esp32-nixie-clock (I know it's a different project but the problem seems to be the wi-fi manager), with custom hardware I made here, I don't see a reason to suspect hardware.

Log:

I (12) boot: ESP-IDF v4.3-dev-1197-g8bc19ba89 2nd stage bootloader
I (12) boot: compile time 02:08:04
I (12) boot: chip revision: 1
I (16) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (23) boot.esp32: SPI Speed      : 40MHz
I (27) boot.esp32: SPI Mode       : DIO
I (32) boot.esp32: SPI Flash Size : 2MB
I (36) boot: Enabling RNG early entropy source...
I (42) boot: Partition Table:
I (45) boot: ## Label            Usage          Type ST Offset   Length
I (53) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (60) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (68) boot:  2 factory          factory app      00 00 00010000 00100000
I (75) boot: End of partition table
I (79) boot_comm: chip revision: 1, min. application chip revision: 0
I (86) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x33d1c (212252) map
I (176) esp_image: segment 1: paddr=0x00043d44 vaddr=0x3ffb0000 size=0x03b18 ( 15128) load
I (183) esp_image: segment 2: paddr=0x00047864 vaddr=0x40080000 size=0x00404 (  1028) load
0x40080000: _WindowOverflow4 at /opt/esp-idf-sdk/components/freertos/xtensa/xtensa_vectors.S:1730

I (183) esp_image: segment 3: paddr=0x00047c70 vaddr=0x40080404 size=0x083a8 ( 33704) load
I (207) esp_image: segment 4: paddr=0x00050020 vaddr=0x400d0020 size=0xa12b8 (660152) map
0x400d0020: _stext at ??:?

I (458) esp_image: segment 5: paddr=0x000f12e0 vaddr=0x400887ac size=0x10134 ( 65844) load
0x400887ac: prvCopyItemAllowSplit at /opt/esp-idf-sdk/components/esp_ringbuf/ringbuf.c:436

I (501) boot: Loaded app from partition at offset 0x10000
I (501) boot: Disabling RNG early entropy source...
I (512) cpu_start: cpu freq: 160
I (512) cpu_start: Pro cpu up.
I (512) cpu_start: Starting app cpu, entry point is 0x40081588
0x40081588: call_start_cpu1 at /opt/esp-idf-sdk/components/esp_system/port/cpu_start.c:112

I (0) cpu_start: App cpu up.
I (533) cpu_start: Pro cpu start user code
I (533) cpu_start: Application information:
I (533) cpu_start: Project name:     nixieclock
I (538) cpu_start: App version:      7e67029
I (543) cpu_start: Compile time:     Oct  1 2020 02:07:56
I (549) cpu_start: ELF file SHA256:  46b55632506b1e3a...
I (555) cpu_start: ESP-IDF:          v4.3-dev-1197-g8bc19ba89
I (562) heap_init: Initializing. RAM available for dynamic allocation:
I (569) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (575) heap_init: At 3FFB83E8 len 00027C18 (159 KiB): DRAM
I (581) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (587) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (594) heap_init: At 400988E0 len 00007720 (29 KiB): IRAM
I (601) spi_flash: detected chip: generic
I (605) spi_flash: flash io: dio
W (609) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (623) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (656) wifi_manager: Set STA IP String to: 0.0.0.0
I (666) DS3231: READ: YEAR:0 MONTH:0 DAY:1 [WDAY:0] - 0:17:28
I (666) clock: The current RTC time is: Sun Jan  1 00:17:28 1900
I (666) clock: Cannot find clock nvs namespace. Attempt to create it
I (676) system_api: Base MAC address is not set
I (676) system_api: read default base MAC address from EFUSE
I (676) gpio: GPIO[4]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:1 
I (696) ws2812: Received R:0 G:0 B:0
I (696) clock: CLOCK_MESSAGE_SLEEPMODE_CONFIG
I (666) clock: clock_save_config_task started
I (736) wifi_init: rx ba win: 6
I (746) wifi_init: tcpip mbox: 32
I (746) wifi_init: udp mbox: 6
I (746) wifi_init: tcp mbox: 6
I (756) wifi_init: tcp tx win: 5744
I (756) wifi_init: tcp rx win: 5744
I (766) wifi_init: tcp mss: 1440
I (766) wifi_init: WiFi IRAM OP enabled
I (766) wifi_init: WiFi RX IRAM OP enabled
W (1836) phy_init: failed to load RF calibration data (0x1102), falling back to full calibration
I (2026) phy: phy_version: 4370, 4e803b3, Aug 11 2020, 14:18:07, 0, 2
I (2046) wifi_manager: 
I (2046) http_server: Registering URI handlers
I (2046) wifi_manager: MESSAGE: ORDER_LOAD_AND_RESTORE_STA
I (2046) wifi_manager: No saved wifi found on startup. Starting access point.
I (2056) wifi_manager: MESSAGE: ORDER_START_AP
I (2066) wifi_manager: WIFI_EVENT_AP_START
I (2166) http_server: Registering URI handlers
I (2166) dns_server: DNS Server listening on 53/udp
I (39216) wifi_manager: WIFI_EVENT_AP_STACONNECTED
I (75276) wifi_manager: WIFI_EVENT_AP_STADISCONNECTED
I (76246) wifi_manager: WIFI_EVENT_AP_STACONNECTED
I (81106) esp_netif_lwip: DHCP server assigned IP to a station, IP is: 10.10.0.2
I (81216) dns_server: Replying to DNS request for connectivitycheck.gstatic.com from 10.10.0.2
I (81216) dns_server: Replying to DNS request for www.google.com from 10.10.0.2
I (81236) dns_server: Replying to DNS request for connectivitycheck.gstatic.com from 10.10.0.2
I (81256) dns_server: Replying to DNS request for www.google.com from 10.10.0.2
I (81316) http_server: GET /generate_204
I (93486) http_server: GET /
I (93636) http_server: GET /style.css
W (93646) httpd_txrx: httpd_sock_err: error in send : 104
W (93646) httpd_txrx: httpd_sock_err: error in recv : 104
I (93656) http_server: GET /code.js
I (93816) http_server: GET /ap.json
I (95156) http_server: GET /status.json
I (95856) http_server: GET /status.json
I (96756) http_server: GET /status.json
I (97736) http_server: GET /ap.json
I (97746) http_server: GET /status.json
I (98926) http_server: GET /status.json
I (100166) http_server: GET /status.json
I (100586) http_server: GET /status.json
I (101566) http_server: GET /ap.json
I (101576) http_server: GET /status.json
I (102756) dns_server: Replying to DNS request for googleads.g.doubleclick.net from 10.10.0.2
I (102766) http_server: GET /status.json
I (102766) dns_server: Replying to DNS request for www.googleadservices.com from 10.10.0.2
I (103606) http_server: GET /status.json
I (104396) http_server: GET /status.json
I (105356) http_server: GET /ap.json
I (105366) http_server: GET /status.json
I (106546) http_server: GET /status.json
I (107786) http_server: GET /status.json
I (108146) http_server: GET /status.json
I (109076) http_server: GET /ap.json
I (109106) http_server: GET /status.json
I (110116) http_server: GET /status.json
I (111516) http_server: GET /status.json
I (111956) http_server: GET /status.json
I (112086) http_server: POST /connect.json
I (112086) http_server: ssid: --------, password: -------- (It did show the correct values, removed for privacy)
I (112086) wifi_manager: MESSAGE: ORDER_CONNECT_STA
I (113176) http_server: GET /status.json
I (114456) http_server: GET /status.json
I (115006) http_server: GET /status.json
I (115966) http_server: GET /ap.json
ESP_ERROR_CHECK failed: esp_err_t 0xffffffff (ESP_FAIL) at 0x40087fc0
0x40087fc0: _esp_error_check_failed at /opt/esp-idf-sdk/components/esp_common/src/esp_err.c:41

file: "../components/esp32-wifi-manager/src/wifi_manager.c" line 981
func: wifi_manager
expression: esp_wifi_scan_start(&scan_config, false)

abort() was called at PC 0x40087fc3 on core 0
0x40087fc3: _esp_error_check_failed at /opt/esp-idf-sdk/components/esp_common/src/esp_err.c:42


Backtrace:0x4008910f:0x3ffbf2d0 0x40089869:0x3ffbf2f0 0x40091fd2:0x3ffbf310 0x40087fc3:0x3ffbf380 0x400dd257:0x3ffbf3a0 0x40089871:0x3ffbf530
0x4008910f: panic_abort at /opt/esp-idf-sdk/components/esp_system/panic.c:349

0x40089869: esp_system_abort at /opt/esp-idf-sdk/components/esp_system/system_api.c:104

0x40091fd2: abort at /opt/esp-idf-sdk/components/newlib/abort.c:46

0x40087fc3: _esp_error_check_failed at /opt/esp-idf-sdk/components/esp_common/src/esp_err.c:42

0x400dd257: wifi_manager at /home/gimbas/Development/esp32-nixie-clock/nixie_clock_code/build/../components/esp32-wifi-manager/src/wifi_manager.c:981 (discriminator 1)

0x40089871: vPortTaskWrapper at /opt/esp-idf-sdk/components/freertos/xtensa/port.c:169



ELF file SHA256: 46b55632506b1e3a

Rebooting...

perigoso avatar Oct 01 '20 01:10 perigoso

Hi,

I had the same issue. Not sure what is causing it but I managed to get round the issue by changing line 1020 to // ESP_ERROR_CHECK(esp_wifi_scan_start(&scan_config, false)); esp_err_t error = esp_wifi_scan_start(&scan_config, false); if (error!=ESP_OK) ESP_LOGI(TAG, "MESSAGE: WIFI SCAN FAILED WITH CODE %d",error);

esp_wifi_scan_start returns -1 or ESP_FAIL so the error check fails. The API does not give any reasons why esp_wifi_scan_stat should return -1.

The code ignores errors which works as a quick fix until something more meaningful comes along.

MPC

MPC-BlackBox avatar Oct 02 '20 13:10 MPC-BlackBox

@MPC-BlackBox I tried that but my esp just started crashing after with some heap error

perigoso avatar Oct 02 '20 22:10 perigoso

The problem is that ESP_FAIL isn't a value that is documented for esp_wifi_scan_start. I'm wondering if this is something going on with 4.3 since noone ever reported this kind of issues and now I have two in a row.

Also is the sourcecode for esp_wifi.h not public? I wanted to look at what esp_wifi_scan_start but I cannot find the definition anywhere!

tonyp7 avatar Oct 04 '20 02:10 tonyp7

Also @perigoso just for context esp32-nixie-clock was last compiled on a 4.2 beta master from esp-idf and I can switch networks without issues.

I'll have to update my repos and try to get this to crash on a dev board.

Everytime esp-idf is updated it seems like something breaks...

tonyp7 avatar Oct 04 '20 02:10 tonyp7

I also tried to look for the function and couldn't find anything, so it's hard to know what is really going for it to return -1.

I know you made that project with 4.2 and intended to test with esp-idf 4.2beta1 and see if it works.

perigoso avatar Oct 04 '20 02:10 perigoso

Edited to remove since unrelated issue.

perigoso avatar Oct 08 '20 22:10 perigoso

@perigoso that's a different issue that was fixed with the latest commit in the master branch ( #108 )

tonyp7 avatar Oct 09 '20 01:10 tonyp7

@tonyp7 ok, the heap crash was indeed fixed on the master branch, but I am still unable to connect even with esp-idf 4.2beta1, esp_wifi_scan_start(&scan_config, false) still fails with -1

perigoso avatar Oct 14 '20 13:10 perigoso

@tonyp7 hey, have you had any time to try and reproduce this issue? I got around it for now by having the credentials hard-coded once, further reprograms work by fetching from memory as long as i don't erase flash. If I can do anything to help I'd be happy to.

perigoso avatar Oct 26 '20 14:10 perigoso

Found the reason for crash: Scan cannot be started when simultaneously trying to connect to station (esp_wifi_scan_start returns ESP_ERR_WIFI_STATE, which ESP_ERROR_CHECK doesn't handle well). I first tried to check additional flags (WIFI_MANAGER_REQUEST_STA_CONNECT_BIT and WIFI_MANAGER_REQUEST_RESTORE_STA_BIT) but I still got occasional race conditions, so I check for ESP_ERR_WIFI_STATE separately anyway and let ESP_ERROR_CHECK() handle the rest.

			case WM_ORDER_START_WIFI_SCAN:
				ESP_LOGD(TAG, "MESSAGE: ORDER_START_WIFI_SCAN");

				/* if a scan is already in progress this message is simply ignored thanks to the WIFI_MANAGER_SCAN_BIT uxBit */
				uxBits = xEventGroupGetBits(wifi_manager_event_group);
				if (! (uxBits & WIFI_MANAGER_SCAN_BIT) ){
					if (! (uxBits & ( WIFI_MANAGER_REQUEST_STA_CONNECT_BIT | WIFI_MANAGER_REQUEST_RESTORE_STA_BIT) ) ) {
						esp_err_t res = esp_wifi_scan_start(&scan_config, false);
						if (res==ESP_ERR_WIFI_STATE) {
							// This might happen when connect retry is attempted while AP scan starts.. Just ignore it
							ESP_LOGE(TAG,"Wifi still in connect mode whan starting scan!");
						} else {
							// handle other errors
							ESP_ERROR_CHECK(res);
							xEventGroupSetBits(wifi_manager_event_group, WIFI_MANAGER_SCAN_BIT);
						}
					} else {
						// Cannot scan while connecting to AP
					}
				}

mjuhanne avatar Oct 26 '20 20:10 mjuhanne

... Wait you mean the -1 reported by ESP_ERROR_CHECK is wrong? That's a great find!

tonyp7 avatar Oct 27 '20 08:10 tonyp7

@mjuhanne @tonyp7 This fix doesn't address the case where ESP_FAIL (-1) is returned by esp_wifi_scan_start(). Is it safe to ignore it along with ESP_ERR_WIFI_STATE?

Edit: I tried it and changed the line to if (res==ESP_ERR_WIFI_STATE || res==ESP_FAIL) { and it seems to be working.

gagank1 avatar Feb 01 '21 07:02 gagank1

Found the reason for crash: Scan cannot be started when simultaneously trying to connect to station (esp_wifi_scan_start returns ESP_ERR_WIFI_STATE, which ESP_ERROR_CHECK doesn't handle well). I first tried to check additional flags (WIFI_MANAGER_REQUEST_STA_CONNECT_BIT and WIFI_MANAGER_REQUEST_RESTORE_STA_BIT) but I still got occasional race conditions, so I check for ESP_ERR_WIFI_STATE separately anyway and let ESP_ERROR_CHECK() handle the rest.

			case WM_ORDER_START_WIFI_SCAN:
				ESP_LOGD(TAG, "MESSAGE: ORDER_START_WIFI_SCAN");

				/* if a scan is already in progress this message is simply ignored thanks to the WIFI_MANAGER_SCAN_BIT uxBit */
				uxBits = xEventGroupGetBits(wifi_manager_event_group);
				if (! (uxBits & WIFI_MANAGER_SCAN_BIT) ){
					if (! (uxBits & ( WIFI_MANAGER_REQUEST_STA_CONNECT_BIT | WIFI_MANAGER_REQUEST_RESTORE_STA_BIT) ) ) {
						esp_err_t res = esp_wifi_scan_start(&scan_config, false);
						if (res==ESP_ERR_WIFI_STATE) {
							// This might happen when connect retry is attempted while AP scan starts.. Just ignore it
							ESP_LOGE(TAG,"Wifi still in connect mode whan starting scan!");
						} else {
							// handle other errors
							ESP_ERROR_CHECK(res);
							xEventGroupSetBits(wifi_manager_event_group, WIFI_MANAGER_SCAN_BIT);
						}
					} else {
						// Cannot scan while connecting to AP
					}
				}

I think I had solved a similar issue a many months ago and I just briefly read through this thread and it seems to be a similar issue. I solved by adjusting based on the below:

			case ORDER_START_WIFI_SCAN:
				ESP_LOGD(TAG, "MESSAGE: ORDER_START_WIFI_SCAN");

				uxBits = xEventGroupGetBits(wifi_manager_event_group);
				// If WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT is SET, do NOT start scanning function as this will cause problems connecting to the external AP
				if(!(uxBits & WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT))
				{
					// if a scan is already in progress this message is ignored thanks to the WIFI_MANAGER_SCAN_BIT uxBit
					if(!(uxBits & WIFI_MANAGER_SCAN_BIT))
					{
						xEventGroupSetBits(wifi_manager_event_group, WIFI_MANAGER_SCAN_BIT);
						// Including esp_wifi_disconnect as per Espressif recommendation (solves another scanning issue)
						ESP_ERROR_CHECK(esp_wifi_disconnect());
						ESP_ERROR_CHECK(esp_wifi_scan_start(&scan_config, true));
					}
				}
				break;

There are two important bits there.

First: I added the "WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT", which I set under the "ORDER_CONNECT_STA" case and clear under the "EVENT_STA_DISCONNECTED" case.

Second: I added ESP_ERROR_CHECK(esp_wifi_disconnect()); as per an Espressif recommendation I found somewhere (I can't remember where, it's been a very long time since solving this).

I hope that relates to your issue and helps out - if not, my mistake!

kevinudemy avatar Aug 08 '21 19:08 kevinudemy

Tried the above changes and the wifi_manager reliability has improved alot. The only issue I get is errno 23. This persists for about 60 seconds then all is okay once errno 113 happens. I think old sockets are not being closed when the wifi disconnects. It takes time for the system to close unused sockets which is the reason for the delay.

Changes which are not shown above are: Around Line 114 wifi_manager.c

// MPC added from https://github.com/tonyp7/esp32-wifi-manager/issues/110
// set under the "WM_ORDER_CONNECT_STA" case and clear under the "EVENT_STA_DISCONNECTED" case.
const int WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT = BIT9;

wifi_manager.c: case WIFI_EVENT_STA_DISCONNECTED:

// MPC WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT added from https://github.com/tonyp7/esp32-wifi-manager/issues/110
xEventGroupClearBits(wifi_manager_event_group, WIFI_MANAGER_WIFI_CONNECTED_BIT | WIFI_MANAGER_SCAN_BIT | WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT);

wifi_manager.c: case WM_ORDER_CONNECT_STA:

if((BaseType_t)msg.param == CONNECTION_REQUEST_USER) {
	xEventGroupSetBits(wifi_manager_event_group, WIFI_MANAGER_REQUEST_STA_CONNECT_BIT | WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT);		// MPC added WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT
}
else if((BaseType_t)msg.param == CONNECTION_REQUEST_RESTORE_CONNECTION) {
	xEventGroupSetBits(wifi_manager_event_group, WIFI_MANAGER_REQUEST_RESTORE_STA_BIT | WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT);		// MPC added WIFI_MANAGER_REQUEST_ORDER_CONNECT_STA_BIT
}

MPC-BlackBox avatar Jan 21 '22 16:01 MPC-BlackBox

Thanks, @MPC-BlackBox. I've incorporated this into a patch at https://github.com/xobs/esp32-wifi-manager/commit/a0bc1bca0dbc0780dbb993806b1db8f35fd430a7 for those of you who are maintaining your own forks now.

xobs avatar Apr 22 '22 03:04 xobs