ESPAsync_WiFiManager icon indicating copy to clipboard operation
ESPAsync_WiFiManager copied to clipboard

Captive Portal hanging depending on active core for AsyncTCP

Open ZongyiYang opened this issue 1 year ago • 2 comments

So I've encountered a weird issue where the AP does not redirect me to the captive portal. It seems to just hang.

I've sort of narrowed down why this happens, and it seems to be dependent on which core AsyncTCP.h launches on with xTaskCreateUniversal.

To reproduce the issue, I have created the following sample, uploaded to a ESP32Cam module. My Espressif version is 2.0.4. I am using the latest AsyncTCP library with the recommended modified AsyncWebServer library as indicated in the readme for AsyncWifiManager.

#include "Arduino.h"
#include <ESPAsync_WiFiManager.h>

// default IP addresses
IPAddress APStaticIP  = IPAddress(192, 168, 100, 1);
IPAddress APStaticGW  = IPAddress(192, 168, 100, 1);
IPAddress APStaticSN  = IPAddress(255, 255, 255, 0);

IPAddress stationIP   = IPAddress(192, 168, 1, 232);
IPAddress gatewayIP   = IPAddress(192, 168, 1, 1);
IPAddress netMask     = IPAddress(255, 255, 255, 0);

IPAddress dns1IP      = gatewayIP;
IPAddress dns2IP      = IPAddress(8, 8, 8, 8);
    
void initAPIPConfigStruct(WiFi_AP_IPConfig &in_WM_AP_IPconfig)
{
  in_WM_AP_IPconfig._ap_static_ip   = APStaticIP;
  in_WM_AP_IPconfig._ap_static_gw   = APStaticGW;
  in_WM_AP_IPconfig._ap_static_sn   = APStaticSN;
}

void initSTAIPConfigStruct(WiFi_STA_IPConfig &in_WM_STA_IPconfig)
{
  in_WM_STA_IPconfig._sta_static_ip   = stationIP;
  in_WM_STA_IPconfig._sta_static_gw   = gatewayIP;
  in_WM_STA_IPconfig._sta_static_sn   = netMask;
  in_WM_STA_IPconfig._sta_static_dns1 = dns1IP;
  in_WM_STA_IPconfig._sta_static_dns2 = dns2IP;
}
    
void startApPortal()
{
  // default IP values
  WiFi_AP_IPConfig  WM_AP_IPconfig;
  WiFi_STA_IPConfig WM_STA_IPconfig;
  initAPIPConfigStruct(WM_AP_IPconfig);
  initSTAIPConfigStruct(WM_STA_IPconfig);

  // construct ESPAsync_wifiManager object
  DNSServer dnsServer;
  AsyncWebServer server(80);
  ESPAsync_WiFiManager ESPAsync_wifiManager(&server, &dnsServer, "AsyncESP32-FSWebServer");
  ESPAsync_wifiManager.setAPStaticIPConfig(WM_AP_IPconfig);
  ESPAsync_wifiManager.setMinimumSignalQuality(-1);
  ESPAsync_wifiManager.setConfigPortalChannel(0);
  ESPAsync_wifiManager.setSTAStaticIPConfig(WM_STA_IPconfig);
  // AP ssid and password
  String apSsid = "ESP_" + String((uint32_t)ESP.getEfuseMac(), HEX);
  const char* apPass = "12345678";
  
  Serial.println("Starting access point on SSID: " + apSsid);
  // Start AP 
  if (!ESPAsync_wifiManager.startConfigPortal(apSsid.c_str(), apPass))
  {
    Serial.println(F("Not connected to WiFi but continuing anyway."));
  }
  else
  {
    Serial.println(F("AP connection successful."));
    Serial.println("  Connected with ip: " + WiFi.localIP().toString());
  }
}
void setup() {
  Serial.begin(115200);
  while (!Serial) {}

  Serial.println("--start--");
  WiFi.mode(WIFI_STA);
  WiFi.begin("", "");
  vTaskDelay(4000 / portTICK_PERIOD_MS);
  WiFi.disconnect(); 
  vTaskDelay(4000 / portTICK_PERIOD_MS);

  // this semaphore call is completely valid, but causes some bug in captive portal
  // this only bugs out if tcp messages are sent on core 1
  // ie: #define CONFIG_ASYNC_TCP_RUNNING_CORE 1 placed in line 34 of AsyncTCP.h
  // the issue goes away if the running core is defined as 0
  // without modifications to AsyncTCP.h, the bug is random since AsyncTCP randomly
  // picks the running core by default
  SemaphoreHandle_t sem = xSemaphoreCreateMutex();
  if (sem != NULL)
    xSemaphoreTake(sem, portMAX_DELAY);
  
  startApPortal();

  if (sem != NULL)
    vSemaphoreDelete(sem);
}

void loop() {
}

Now in AsyncTCP.h, I have added the following code in line 34:

// START ADDITIONAL CODE------
#define CONFIG_ASYNC_TCP_RUNNING_CORE 1
// END ADDITIONAL CODE------

#ifndef CONFIG_ASYNC_TCP_RUNNING_CORE
#define CONFIG_ASYNC_TCP_RUNNING_CORE -1 //any available core
#define CONFIG_ASYNC_TCP_USE_WDT 1 //if enabled, adds between 33us and 200us per event
#endif

The expected behavior is that on running, a AP is created that directs a user to a captive portal on joining the AP. Note that there is some semaphore code around the startApPortal function call. This technically does nothing but seems to be important in triggering the bug. Perhaps it is because the Mutex is messing with task priorities? I don't think it should be. It could also just be timing related.

To force the bug, if CONFIG_ASYNC_TCP_RUNNING_CORE in AsyncTCP.h is set to 1, the bug happens and the captive portal hangs. If CONFIG_ASYNC_TCP_RUNNING_CORE is set to 0, the bug does not happen and the program functions as expected. However, during default AsyncTCP.h operation, it randomly chooses which core to run TCP calls on. If it lands on the wrong core due to load the captive portal might hang.

In a more general use case, this bug causes a problem when attempting to use this library in some background loop on a core.

ZongyiYang avatar Aug 09 '22 07:08 ZongyiYang

Also not sure if this is needed, but perhaps the core settings for Arduino and Events are also relevant: image

ZongyiYang avatar Aug 09 '22 08:08 ZongyiYang

For a "fix", I have a way to fix the bug but I do not know the root cause of the original issue so I can't really say this is a 100% fix. Perhaps it just masks the root cause due to changes in task priority/timing.

Anyway, swapping the DNSServer to this implementation of AsyncDNSServer so both the webserver and dns server are async seems to resolve the bug in both the example above and also in my more complicated application where I have WifiManager running in a thread in the background restoring the AP if the Wifi goes down.

An example of the changes needed in WifiManager: https://github.com/ZongyiYang/ESPAsync_WiFiManager/commit/e4a2de14eb791a9a5a1be1790ed47be6d52124cf

Note that this commit only changes the src, the examples would need to be updated too.

ZongyiYang avatar Aug 09 '22 08:08 ZongyiYang

Hi @ZongyiYang

Thanks for using the library as well as pointing out the issue and the proposed good fix.

I believe the best way is to use ESPAsyncDNSServer for this Async library. My mistake is that I forgot and didn't change and use the ESPAsyncDNSServer library instead of DNSServer at the beginning.

If possible, could you please help by modifying the library as well as all the related examples and converting them to use ESPAsyncDNSServer instead of DNSServer, then make the Pull Request. I'll recheck and merge then.

If not possible, it's OK, just please let me know so that I'll take care of the issue.

Regards,

khoih-prog avatar Aug 11 '22 02:08 khoih-prog

Hello. Sorry for the late reply, I will try to make a pull request tonight for this.

ZongyiYang avatar Aug 17 '22 15:08 ZongyiYang

A pull request has been made.

ZongyiYang avatar Aug 18 '22 05:08 ZongyiYang