thin-edge.io
thin-edge.io copied to clipboard
Support Cumulocity Device Availability feature
Is your feature request related to a problem? Please describe. Support the Cumulocity IoT availability monitoring out of the box to improve the usability by showing the thin-edge.io agent.
Cumulocity References:
- https://cumulocity.com/docs/device-integration/fragment-library/#device-availability
Example (from Cumulocity IoT Device Management Application):
Requirements
- Send "heartbeat" periodically (e.g. this is generally an empty managed object update request, or sending of an event, measurement or alarm)
- Configurable heartbeat period at which the heartbeat signal is sent (e.g. default to 30mins)
Describe the solution you'd like
- Add two new keys to tedge config as below. These are the default values. (Thanks to @reubenmiller)
[c8y.availability]
period = 60 # Required interval, and period of the heartbeat (if enabled = true)
enable = true # disable this feature all together, don't send a 117 message, and disable the heartbeat
-
tedge-mapper-c8y
reads the keys on startup, and send a SmartREST117
message per device, to both main and child devices. Note that this SmartREST message doesn't update the value if it's already set as described in the c8y user guide. - If a new child device is registered in runtime,
tedge-mapper-c8y
publishes a117
message for the device. -
tedge-mapper-c8y
starts an internal timer for each device. Timer for the main device starts on startup. Timer for a child device starts when it is registered to the entity store. - When timer fires,
tedge-mapper-c8y
checks the service status of the device'stedge-agent
by default. If the status is "up",tedge-mapper-c8y
publishes an empty inventory update message. - If the device declares a different lead service in its metadata as
@health
,tedge-mapper-c8y
uses the lead service's status to determine the device's availability. For the details of the mechanism , see the description of the PR.
Describe alternatives you've considered Since sending alarms, events, measurements, inventory updates are considered an availability update message, when such message has been sent by tedge-mapper-c8y, tedge-mapper-c8y resets the timer of sending heartbeat messages.
Additional context Note: Updating service status is NOT considered an availability update event
Updates to the device itself (with a given ID), in the form of empty PUT requests or requests with an ID only, that is {} or {"id": ... }
For example, an empty inventory update message should be as below.
tedge mqtt pub c8y/inventory/managedObjects/update/{{xid}} '{}'
Discussion: What "available" means for main and child devices? For the main device, we should consider that tedge-agent is the lead service which defines the main device is "available".
For child devices, since tedge-agent is recommended to run there, we can also consider that tedge-agent is the key of the availability. However, since it's not mandatory to install tedge-agent in child devices, we should give a way to users that which service defines that the device is available.
Insight: c8y_Availability
There are several ways to set c8y_RequiredAvailability
: SmartREST, JSON over MQTT, and REST.
The server side behaviour is different using SmartREST or others.
The SmartREST 117
message will be only set if no value exists. It doesn't support update. The other ways updates the inventory whether it's set previously or not.
Even for JSON over MQTT, thin-edge has 3 options to use it; Publishing directly to c8y/inventory/managedObjects/update/{{xid}}
, using our twin
topic, or using inventory.json
feature.
-
Direct JSON over MQTT topic:
c8y/inventory/managedObjects/update/{{xid}}
payload:{"c8y_RequiredAvailability":{"responseInterval":45}}
-
"twin" topic topic:
te/device/{{xid}}///twin/c8y_RequiredAvailavility
payload:{"responseInterval":88}
-
inventory.json described here
Should we also have a config flag to disable the "fake" heartbeat messages auto-generated by the mapper? The purpose of the availability monitoring for a device may not just be about monitoring if the device is connected and able to send some data, but could be about monitoring if it's really doing what the device is meant to do (generating the real telemetry data), right? So, in such cases, keeping the availability alive by sending those fake updates might lead to the admin missing any fatal issue with the core function of the device.
We can add a boolean setting to control where this feature is controlled by the device or not by using: c8y.availability.enable
(see below). Then the user can control availability period purely from the cloud, and rely on the telemetry data being published to control the availability status.
However the main driver of the availability flag is more about the service status rather than if telemetry data can be sent (e.g. measurements/events)...but Rina's proposal (along with the enable
setting) should cover most use-cases.
[c8y.availability]
period = 60 # Required interval, and period of the heartbeat (if enabled = true)
enable = true # disable this feature all together, don't send a 117 message, and disable the heartbeat
@reubenmiller
[c8y.availability] period = 60 # Required interval, and period of the heartbeat (if enabled = true) enable = true # disable this feature all together, don't send a 117 message, and disable the heartbeat
What should be the default values for these? The example above?
@reubenmiller
[c8y.availability]
period = 60 # Required interval, and period of the heartbeat (if enabled = true)
enable = true # disable this feature all together, don't send a 117 message, and disable the heartbeat
What should be the default values for these? The example above?
Yeah default of 60 mins and true (enabled) are good. These setting shouldn't put too much undue strain on a system and should not send too much additional data over the network.
Resolved by https://github.com/thin-edge/thin-edge.io/pull/2940
All QA tasks are done during development and tested/comented in the resolving PR, throughly checked for regression. Eventual increase of test coverege will be done in separate PR