ESPAsyncWebServer
ESPAsyncWebServer copied to clipboard
High load websockets causing crashes + SOLUTION
I'm writing a server that handles websocket connections using ESPAsyncWebserver and ArduinoJSON. As part of that, I was testing to see how many messages it could handle and I was just generally having things crash a lot. Often, I would get a message like this:
++++ Error when running speed test and sending ping messages. Looks like async_tcp just got too busy.
E (19871) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (19871) task_wdt: - async_tcp (CPU 0/1)
E (19871) task_wdt: Tasks currently running:
E (19871) task_wdt: CPU 0: IDLE
E (19871) task_wdt: CPU 1: IDLE
E (19871) task_wdt: Aborting.
abort() was called at PC 0x40106b70 on core 0
After digging around online, I found this good suggestion on Stackoverflow.
Based on that, I implemented something in my code sort of like below. After that, the crashes are pretty much gone unless I try to push it past 150 messages per second. Its more than happy to chug along at 100 messages per second.
When you do a lot of work in the callback, you run the risk of a watchdog timer reset. If you change the callback to basically just take the message and mark it for processing, then that callback becomes very fast. You then process the message and respond in your main loop which can take as long as you like.
I haven't seen anything in the documentation, so I thought I'd share this with you all as it might save you some headache.
As a side note, running the ESPAsyncWebserver + websockets + ArduinoJSON + serving static html+js+css through SPIFFS is such a killer combo. You can build really nice, responsive websites. I absolutely cannot believe that this thing is running on a little microcontroller. So good. If you want to see how I'm doing it, check out Yarrboard.
CircularBuffer<WebsocketRequest*, YB_RECEIVE_BUFFER_COUNT> wsRequests;
void loop()
{
//sometimes websocket clients die badly.
ws.cleanupClients();
//process our websockets outside the callback.
if (!wsRequests.isEmpty())
handleWebsocketMessageLoop(wsRequests.shift());
}
void handleWebSocketMessage(void *arg, uint8_t *data, size_t len, AsyncWebSocketClient *client)
{
AwsFrameInfo *info = (AwsFrameInfo *)arg;
if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
{
if (!wsRequests.isFull())
{
WebsocketRequest* wr = new WebsocketRequest;
wr->client_id = client->id();
strlcpy(wr->buffer, (char*)data, YB_RECEIVE_BUFFER_LENGTH);
wsRequests.push(wr);
}
}
}
void handleWebsocketMessageLoop(WebsocketRequest* request)
{
char jsonBuffer[MAX_JSON_LENGTH];
StaticJsonDocument<2048> output;
StaticJsonDocument<1024> input;
DeserializationError err = deserializeJson(input, request->buffer);
String mycmd = input["cmd"] | "???";
if (err)
generateErrorJSON(output, error);
else
handleReceivedJSON(input, output, YBP_MODE_WEBSOCKET, request->client_id);
if (output.size())
{
serializeJson(output, jsonBuffer);
if (ws.availableForWrite(request->client_id))
ws.text(request->client_id, jsonBuffer);
else
Serial.println("[socket] client full");
}
delete request;
}
That's an interesting solution, I like it, thanks for posting! I agree with your comment about how a really nice app is possible with the combination of libraries you mention.
My web socket app does not have a high message rate, just some knobs and buttons on a browser for human interaction. At first I had similar watchdog timer crashes and learned (after days of debug) that doing almost anything in the message handler causes the watchdog to bite.
My solution is maybe not so fast as yours but maybe more flexible. The message handler triggers a timer Ticker (with 1millisec delay) that calls a function and passes a JSON argument. The function asynchronously processes the arg, then calls a client notifier to socket.textAll to update all clients. It requires some care to prevent successive timer calls from short-circuiting previous calls but it's not possible to hang the main loop.
It has not rebooted itself in nearly a year of adding code. Here is a sample message handler, where the incoming message has an action and optional value
Ticker socketActionTicker;
void handleSocketMsg(void *arg, uint8_t *data, size_t len)
{
AwsFrameInfo *info = (AwsFrameInfo *)arg;
if (info->final && info->index == 0 && info->len == len && info->opcode == WS_TEXT)
{
const uint8_t size = JSON_OBJECT_SIZE(8);
StaticJsonDocument<size> json;
DeserializationError err = deserializeJson(json, data);
if (err)
{
#if DEBUG_OUTPUT
Serial.printf("\ndeserializeJson() failed with code %s", err.c_str());
void * dataStr[256];
memset(dataStr, 0, 256);
memcpy(dataStr, data, 255);
Serial.printf("\nSocket message from client:\n%s\n", dataStr);
serializeJsonPretty(json, Serial);
#endif
return;
}
#if DEBUG_OUTPUT
Serial.println("\nSocket message from client:\n");
serializeJsonPretty(json, Serial);
#endif
const char *action = json["action"];
if (strcmp(action, "setInput") == 0)
{
uint_fast8_t newIn = json["value"];
if (newIn != input)
socketActionTicker.once_ms(1, setInput, newIn);
}
else if (strcmp(action, "setPwrDn") == 0)
{
socketActionTicker.once_ms(1, powerDn);
}
else if (strcmp(action, "setPwrUp") == 0)
{
socketActionTicker.once_ms(1, powerUp);
}
else if (strcmp(action, "getInput") == 0)
{
socketActionTicker.once_ms(1, sendInput);
}
else if (strcmp(action, "getPwrState") == 0)
{
uint_fast8_t powerState = digitalRead(POWR); // state is read from hardware
socketActionTicker.once_ms(1, sendPwrSts);
}
}
}
// and a client notification
void notifyClients(JsonDocument& jsonDoc)
{
#if MONITOR_STATUS
Serial.printf("\nENTERING notifyClients(jsonDoc)");
#endif
#if DEBUG_OUTPUT
Serial.printf("\n");
serializeJsonPretty(jsonDoc, Serial);
#endif
char buffer[512];
size_t len = serializeJson(jsonDoc, buffer);
socket.textAll(buffer, len);
}
So, quick update here.
At high message rates, it was still occasionally crashing (like every 12h or so, at 100 messages/sec)
I changed to this fork (https://github.com/nthd/ESPAsyncWebServer) that has some stability fixes applied and that helped quite a bit. It still crashes occasionally under higher load, but not nearly as much. Under light load I haven't seen any crashes in a while.
see #1363 to see the code that causes mentioned crash
looks like the fork also suffers from this bug
after closer look, the fork will crash but after much higer load as you said...it might be related to active number of connections. maybe in between calls ws.cleanupClients which I do only once per second. unfortunately, platformio esp exception decoder does not seem to be able to decode the crash even with debug build
I believe this will be fixed after applying this set of patches https://github.com/Aircoookie/WLED/issues/3382#issuecomment-1751363805, which I plan to do soon on top of nthd fork
I've done some more digging, which might be helpful to someone. After getting the exception decoder working, abort during stress tests was always coming down to lack of memory so allocation crash.
There doesn't seem to be any leak when connection is open and websockets data is streamed, but there is a per-connection leak after connection is closed. I am not able to trace it, and I haven't found any information on it in in any of the forks.
So a simple while true; do curl esp32/anyurl; sleep 0.2; done
will always result in an out-of-memory crash
I will be looking for more luck with Espressif native websockets server....
@ninja- funny you say that.... I've been working on a new project that is a wrapper in the style of ESPAsyncWebServer but uses the native ESP-IDF server calls under the hood. literally just wrote most of it in the last week but its functional and almost feature complete.
Here's the link: https://github.com/hoeken/PsychicHTTP/
@hoeken that's some really nice code - looks like we've been duplicating effort of porting the websockets demo code to c++, a quite slow and painful process :)
@ninja- thanks man. yeah the esp-idf library is great, but its not exactly user friendly. it does make a fantastic base for a proper HTTP server though.
I don't know where you are in your process, but maybe we can collaborate. I just have a few things I'd like to sort out (uploads) before I make a v1.0.0 release of my library. My personal project Yarrboard has now been running on it for a couple days flawlessly, and I also did a load test on the example code which went well . There are definitely some performance gains to be made, but its definitely acceptable for now, and more importantly rock solid.
I've got caught up in a couple issues today:
- Arduino library shipped suffers from bug where c_str() can return NULLPTR under memory pressure. Took me a while to debug weird crashes and at least add a clear
assert
. That was supposedly fixed in one of the forks, where it at least falls back to empty string or ignores theconcat
operation. - Fighting general WiFi/BT instability and crashes after ~1-2h of light load. Not sure if there's a memleak somewhere or just general instability.
- Trying to update espressif from 5.1.1 to 5.1.2, which is only possible with fork https://github.com/tasmota/platform-espressif32/tree/Arduino/IDF5. Took a ton of trying to get it working, and fighting with sdkconfig generator.
- Turns out enabling both espressif and arduino frameworks silently downgrades espressif to some crappy old version that's not even 5.1.1, but something ancient. I could have taken the route of removing all Arduino code entirely, but rewrite of BLE code is not worth the time right now. That should be worked around by using this fork
- huge_app.csv should be replaced with partitions_singleapp_large.csv for new espressif
- fork author removed support for BLE for time being, seems like copy-pasting the latest Arduino BLE library works fine though
- Espressif upgrade results in newer toolchain which detected a couple of issues like improper logger format strings in Psychic :) passing String instead of str.c_str() => crash
Sooo only now I got it to compile and upload with espressif 1.5.2, so I am looking forward to testing the stability with WiFI, BLE and WebSockets enabled...
@ninja- have you seen the new v3.0 beta of the official esp-idf/arduino framework? looks like its using latest v5.1.x branch of esp-idf. https://blog.espressif.com/announcing-the-arduino-esp32-core-version-3-0-0-3bf9f24e20d4
What platform are you using? Currently I'm only targeting Arduino, but it would be cool to have pure esp-idf support as well. This is my setup in platformio:
platform = espressif32 board = esp32dev framework = arduino
If you want, lets move this thread to the psychichttp github issues tracker
I do yes, but it's not compatible with platformio without using a fork.
platform = https://github.com/tasmota/platform-espressif32.git#Arduino/IDF5
framework =
arduino
espidf
If anyone is following this thread, PsychicHttp v1.0 has been released. Feel free to give it a try and let me know your feedback. https://github.com/hoeken/PsychicHTTP/