AI-on-the-edge-device icon indicating copy to clipboard operation
AI-on-the-edge-device copied to clipboard

Sporadic crashes related to MQTT publishing

Open caco3 opened this issue 2 years ago • 6 comments

Issue

We still see sporadic crashes reloated to MQTT publishing. They mostly, but not always occur when MQTT init failed:

␛[0;31mE (24715) MQTT_CLIENT: Error create mqtt task␛[0m
␛[0;31mE (24715) MQTT IF: Could not start client (ret=-4)!␛[0m
␛[0;31mE (24735) MQTT IF: Init failed, skipping all MQTT publishings in this round!␛[0m
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
␛[0;31mE (22635) MQTT_CLIENT: Error create mqtt task␛[0m
␛[0;31mE (22645) MQTT IF: Could not start client (ret=-4)!␛[0m
␛[0;31mE (22675) MQTT IF: Init failed, skipping all MQTT publishings in this round!␛[0m

assert failed: spinlock_acquire spinlock.h:123 ((result == SPINLOCK_FREE) == (lock->count == 0))
[      18] 2022-12-09T22:47:53 <ERR> [MQTT IF] Could not start client (ret=-4)!
[      18] 2022-12-09T22:47:53 <ERR> [MQTT IF] Init failed, skipping all MQTT publishings in this round!
[      18] 2022-12-09T22:47:53 <DBG> [MQTT SERVER] Publishing static MQTT topics...
[      18] 2022-12-09T22:47:53 <DBG> [MQTT SERVER] Publishing system MQTT topics...
   <= CRASH, although all publishing got skipped for this round... =>
[       4] 2022-12-09T21:48:01 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:48:01 <INF> [MAIN] ==================== Startup ====================
[       4] 2022-12-09T21:48:01 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:48:01 <INF> [MAIN] Branch: 'various-corrections', Revision: 5c2ccb2+, Date/Time: 2022-12-08 00:14, Web UI: Development-Branch: rolling (Commit: 2997bf0)
[       4] 2022-12-09T21:48:01 <INF> [MAIN] Reset reason: Exception/panic
[      20] 2022-12-09T22:53:12 <WRN> [MQTT IF] Could not start client (ret=-4), retrying...
[      20] 2022-12-09T22:53:12 <ERR> [MQTT IF] Could not start client (ret=-4)!
[      20] 2022-12-09T22:53:12 <ERR> [MQTT IF] Init failed, skipping all MQTT publishings in this round!
[      20] 2022-12-09T22:53:12 <DBG> [MQTT SERVER] Publishing static MQTT topics...
[      20] 2022-12-09T22:53:13 <DBG> [MQTT SERVER] Publishing system MQTT topics...
   <= CRASH, although all publishing got skipped for this round... =>
[       4] 2022-12-09T21:53:21 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:53:21 <INF> [MAIN] ==================== Startup ====================
[       4] 2022-12-09T21:53:21 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:53:21 <INF> [MAIN] Branch: 'various-corrections', Revision: 5c2ccb2+, Date/Time: 2022-12-08 00:14, Web UI: Development-Branch: rolling (Commit: 2997bf0)
[       4] 2022-12-09T21:53:21 <INF> [MAIN] Reset reason: Exception/panic
[     321] 2022-12-09T22:58:36 <INF> [MQTT IF] Connected to broker
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/connection, content: connected (msg_id=6425)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT SERVER] Publishing static MQTT topics...
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/MAC, content: 30:C6:F7:04:A4:58 (msg_id=58149)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/IP, content: 192.168.1.153 (msg_id=20501)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/hostname, content: Testzaehler (msg_id=4415)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/interval, content: 1.0 (msg_id=310)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT SERVER] Publishing system MQTT topics...
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/uptime, content: 312 (msg_id=26396)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/freeMem, content: 3126883 (msg_id=3879)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/wifiRSSI, content: -53 (msg_id=56346)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/CPUtemp, content: 63 (msg_id=9048)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/connection, content: connected (msg_id=2286)
[     321] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/status, content: Take Image (msg_id=48612)
  <= Why do those topics get sent again? =>
[     322] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/freeMem, content: 3130631 (msg_id=31445)
[     322] 2022-12-09T22:58:37 <DBG> [MQTT IF] Published topic: testzaehler/wifiRSSI, content: -53 (msg_id=10988)
[     322] 2022-12-09T22:58:37 <DBG> [MQTT IF] Published topic: testzaehler/CPUtemp, content: 63 (msg_id=65331)
   <= CRASH =>
[       4] 2022-12-09T21:58:44 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:58:44 <INF> [MAIN] ==================== Startup ====================
[       4] 2022-12-09T21:58:44 <INF> [MAIN] =================================================
[       4] 2022-12-09T21:58:44 <INF> [MAIN] Branch: 'various-corrections', Revision: 5c2ccb2+, Date/Time: 2022-12-08 00:14, Web UI: Development-Branch: rolling (Commit: 2997bf0)
[       4] 2022-12-09T21:58:44 <INF> [MAIN] Reset reason: Exception/panic

caco3 avatar Dec 09 '22 22:12 caco3

@Slider0007, @jomjol FYI

I am not sure yet how we can catch this :(

caco3 avatar Dec 09 '22 22:12 caco3

Today I was able to force the start client error from time to time when wifi signal quality was really bad and connection went on and off many times. I crafted a version where mqtt connection only gets intiated when wifi connection is available. Additionally I increased net timeout to 20s because of this discussion #1517. This version is running for some time Maybe you guys can give this verison a try.

firmware_net_timeout_increased+interlock WIFI.zip

Additionally question: Is it even necessary to recreate the client on every error or is it possible to just leave it running an wait until connection is estalished again (repeatly checked by system service)?

Slider0007 avatar Dec 09 '22 23:12 Slider0007

<= Why do those topics get sent again? => [     322] 2022-12-09T22:58:36 <DBG> [MQTT IF] Published topic: testzaehler/freeMem, content: 3130631 (msg_id=31445) [     322] 2022-12-09T22:58:37 <DBG> [MQTT IF] Published topic: testzaehler/wifiRSSI, content: -53 (msg_id=10988) [     322] 2022-12-09T22:58:37 <DBG> [MQTT IF] Published topic: testzaehler/CPUtemp, content: 63 (msg_id=65331)

Messages get published with QoS1 (devlier at least once) therefore every message has to be acknowleged by broker. If no acknowledge in time, message is going to be resent.

Slider0007 avatar Dec 10 '22 14:12 Slider0007

Messages get published with QoS1 (devlier at least once) therefore every message has to be acknowleged by broker. If no acknowledge in time, message is going to be resent

Ah, ok, that makes sense. How ever i have a local broker, I don't think it fails to acknowledge messages, so the issue more likely is on device side :(

caco3 avatar Dec 10 '22 20:12 caco3

Have problems with MQTT as well (latest version from today). [MQTT IF] Failed to publish topic 'wasserzaehler/status', skipping all MQTT publishings in this round! Wifi access is ok -91dBm Restarting the mosquito-server helps for one round

wettermann32 avatar Dec 10 '22 22:12 wettermann32

grafik

RetCode-4 := Username / Password Problem

In meiner Installation (aktuell 13.0.5) gibt es auch Probleme, die hier bereits gemeldet wurden. Aber mein MQTT Broker wird, unabhängig von den MQTT Log Meldungen, seit Version 12.0.1 mit Daten versorgt. Das ist im Momenrt auch so ... Siehe MQTT Explorer.

Vielleicht sucht ihr an der falschen Stelle?

grafik

TopGoalVl avatar Dec 10 '22 22:12 TopGoalVl

@wettermann32: MQTT stability has still to be improved. Working in progress... Hope next version will be more stable.

Wifi access is ok -91dBm

-91dBm is quite bad. You should make sure you have better coverage.

Slider0007 avatar Dec 11 '22 21:12 Slider0007

@TopGoalVl: Thanks for your comment, but the return value in @caco3 screenshots is missleading because, if I remember right, he modified the return values to have more verbose error feedback. Original library returns on every error -> -1

Slider0007 avatar Dec 11 '22 21:12 Slider0007

Yes, the -4 was a (just locally compiled) modification of the MQTT client. Normelly it just returns -1 on all errors.

caco3 avatar Dec 11 '22 21:12 caco3

Thanks @Slider0007!

It seems to be quite stable now, therefore I close this issue.

caco3 avatar Dec 17 '22 20:12 caco3