Soft WDT reset on MQTT connect
My platform is NodeMCU 1.0 (ESP-12E)
I read similar posts here and there concluding on FastLED or other library issues, but I am not using these libraries.
Almost 60% of the time the connection attempt is causing a Soft WDT reset as the connection attempt is blocking too long and apparently I have no control over it., setting the socket timeout to 1 sec also doesn't help.
There is obviously a slight chance that something else is causing the issue, but I doubt it since I always have the watchdog issue appearing when MQTT tries connecting. Please let me know how to find out if something else is blocking ?
I also tried adding below in vain:
mqtt_client.setSocketTimeout(1);
The neuralgic code part:
#define MQTT_SOCKET_TIMEOUT 1
// ...
bool mqtt_connected = false;
while (!mqtt_client.connected() && retries < 10)
{
time_t now = time(NULL);
SerialPrint("-> MQTT connecting ... ");
// PubSubClient::connect(const char *id, const char *user, const char *pass, const char* willTopic, uint8_t willQos, boolean willRetain, const char* willMessage, boolean cleanSession
if (!CleanSession) mqtt_connected = mqtt_client.connect(mqtt_client_id, mqtt_username, sas_token, 0, 0, 0, 0, 0);
else mqtt_connected = mqtt_client.connect(mqtt_client_id, mqtt_username, sas_token);
if (mqtt_connected)
{
char connMsg[60];
snprintf(connMsg, sizeof(connMsg), "connected in %d ms", millis() - connectStart);
SerialPrintln(connMsg);
}
else
{
int c = mqtt_client.state();
char errMsg[64];
if (c == MQTT_CONNECTION_TIMEOUT) snprintf(errMsg, sizeof(errMsg), " connection timeout, status code = %d", c);
if (c == MQTT_CONNECTION_LOST) snprintf(errMsg, sizeof(errMsg), " connection lost, status code = %d", c);
if (c == MQTT_CONNECT_FAILED) snprintf(errMsg, sizeof(errMsg), " connection failed, status code = %d", c);
if (c == MQTT_DISCONNECTED) snprintf(errMsg, sizeof(errMsg), " disconnected, status code = %d", c);
if (c == MQTT_CONNECT_BAD_PROTOCOL) snprintf(errMsg, sizeof(errMsg), " bad protocol, status code = %d", c);
if (c == MQTT_CONNECT_BAD_CLIENT_ID) snprintf(errMsg, sizeof(errMsg), " bad client id, status code = %d", c);
if (c == MQTT_CONNECT_UNAVAILABLE) snprintf(errMsg, sizeof(errMsg), " unavailable, status code = %d", c);
if (c == MQTT_CONNECT_BAD_CREDENTIALS) snprintf(errMsg, sizeof(errMsg), " bad credentials, status code = %d", c);
if (c == MQTT_CONNECT_UNAUTHORIZED) snprintf(errMsg, sizeof(errMsg), " unauthorized, status code = %d", c);
SerialPrint(errMsg);
char sslErr[100];
if (wifi_client.getLastSSLError(sslErr, 100) < 0)
{
char sslMsg[128];
snprintf(sslMsg, sizeof(sslMsg), " , SSL error : %s", sslErr);
SerialPrintln(sslMsg);
}
else SerialPrintln();
SerialPrintln();
SerialPrintln("-> Retrying MQTT connection in 3 seconds...");
delay(3000);
}
retries++;
}`
Debug print:
-> MQTT connecting ...
--------------- CUT HERE FOR EXCEPTION DECODER ---------------
Soft WDT reset
You might try pubsubclient3. This lib here is missing a yield() call in connect().
Thanks @hmueller01 for ponting out. I tried this library but the issue is the same. I don't understand why the connect is never returning with a socket timeout or other error but instead a wdt reset occurres. Might be that some underlying component is blocking. The mqtt client is instantiated upon a BearSSL secure wifi client (TLS is requied by Azure IOT Hub client), could be the block happens there. But what makes it super weird is that 3 times out of 10 attempts the same code works and connection is made. I am investigating this for weeks now and getting kind of clueless...
I see, this is sad. Does you net client connect? I need to the the fingerprint, so I can't use the PubSubClient internal net connect, and connect to the broker before ...
WiFiClientSecure m_net_client;
PubSubClient m_mqtt_client(m_net_client);
INFOF_P("%s: Attempting connection to server %s:%s" LF, __func__, mqtt_host, mqtt_port);
m_net_client.setFingerprint(mqtt_fingerprint);
if (m_net_client.connect(mqtt_host, atoi(mqtt_port)) == 0) {
INFOF_P("%s: Connection failed!" LF, __func__);
} else {
INFOF_P("%s: Sucsessfully connected." LF, __func__);
}
and later ...
if (m_mqtt_client.connect(host_name, mqtt_user, mqtt_pass, topic, 1, true, "offline", true)) {
INFOF_P("%s: connected using PubSubClient" LF, __func__);
} else {
// m_mqtt_client.connect() failed
int mqtt_state = m_mqtt_client.state();
ERRORF_P("%s: mqttReconnect failed, rc=%d" LF, __func__, mqtt_state);
}
I managed to drill down to wifi_client.connect() is causing the WDT reset.
Despite the fact that I am using Azure IO Hub which supports TLS Fragment Length negotiation and hence I am setting client buffer size to 512 Bytes to reduce memory consumption, it looks like it is unpredictable if BearSSL library manage to make the call or not.
See https://forum.arduino.cc/t/soft-wdt-reset-on-second-call-to-wificlientsecure-connect-help/694398
Looks like my code has grown to a point where the free heap size (~20k) makes BearSSL to crash in an unpredictable way. I either need to simplify the code to achieve a larger heap size, or switch to ESP32, but the latter is a real pain in my case since I am using my purpose built PCB with the ESP-12E chip :-(
So I would say this is not an issue with PubSubClient library, and the Soft WDT reset is a bit deceiving here which can't be fixed with carefully placed yield() calls. It appears to be a memory corruption happening within BearSSL.
Will update this thread if any tangible results.