thin-edge.io
thin-edge.io copied to clipboard
Cumulocity mapper clearing alarms on its own without a user triggering it
Describe the bug
The Cumulocity mapper sometimes unnecessarily clears alarms on its own, that were not explicitly cleared by the user, as part of the alarm syncing that it performs during startup. This happens when the mapper doesn't receive the retained alarm messages from the broker when it starts up, which the mapper interprets as the alarm having been cleared while it was down and it ends up clearing the alarm from the cloud on its own.
The alarm sync logic on mapper startup (which detects alarms cleared while the mapper was down), relies on all currently retained alarms to be delivered to it when it starts up so that it can detect any new alarms or cleared alarms, while it wasn't up and running. But sometimes, the retained messages are not delivered to the mapper when it disconnects and reconnects too quickly (within the keep-alive timeout of 60s). Since the mapper uses a persistent connection to the broker, the broker will treat a such quick disconnection and reconnection just as a temporary network blip and hence wouldn't deliver any retained messages to it when it reconnects.
To Reproduce
After raising an alarm, immediately stop and restart the mapper too quickly (within a keep-alive window of 60s).
Expected behavior
An alarm that's not cleared by the user should not be cleared by the mapper on its own.
To fix this, Cumulocity mapper can use a separate non-persistent MQTT client connections for alarms data processing. Using a cleanSession
for alarms will make sure that the mapper will always get the retained alarm messages whenever it connects to the broker. Since alarms are represented using retained messages, a persistent client connection is not required to guarantee message delivery even when the mapper is down and restarts later. In fact, using a persistent session breaks the alarm sync logic as mentioned above. So, a non-persistent connection(with cleanSession true
) should be used specifically to handle alarms data and a separate persistent connection to handle all other data like measurements, operations etc (as it is today).