thin-edge.io icon indicating copy to clipboard operation
thin-edge.io copied to clipboard

Support Cumulocity Device Availability feature

Open rina23q opened this issue 10 months ago • 4 comments

Is your feature request related to a problem? Please describe. Support the Cumulocity IoT availability monitoring out of the box to improve the usability by showing the thin-edge.io agent.

Cumulocity References:

  • https://cumulocity.com/docs/device-integration/fragment-library/#device-availability

Example (from Cumulocity IoT Device Management Application): image Requirements

  • Send "heartbeat" periodically (e.g. this is generally an empty managed object update request, or sending of an event, measurement or alarm)
  • Configurable heartbeat period at which the heartbeat signal is sent (e.g. default to 30mins)

Describe the solution you'd like

  1. Add two new keys to tedge config as below. These are the default values. (Thanks to @reubenmiller)
[c8y.availability]
period = 60     # Required interval, and period of the heartbeat (if enabled = true)
enable = true   # disable this feature all together, don't send a 117 message, and disable the heartbeat
  1. tedge-mapper-c8y reads the keys on startup, and send a SmartREST 117 message per device, to both main and child devices. Note that this SmartREST message doesn't update the value if it's already set as described in the c8y user guide.
  2. If a new child device is registered in runtime, tedge-mapper-c8y publishes a 117 message for the device.
  3. tedge-mapper-c8y starts an internal timer for each device. Timer for the main device starts on startup. Timer for a child device starts when it is registered to the entity store.
  4. When timer fires, tedge-mapper-c8y checks the service status of the device's tedge-agent by default. If the status is "up", tedge-mapper-c8y publishes an empty inventory update message.
  5. If the device declares a different lead service in its metadata as @health, tedge-mapper-c8y uses the lead service's status to determine the device's availability. For the details of the mechanism , see the description of the PR.

Describe alternatives you've considered Since sending alarms, events, measurements, inventory updates are considered an availability update message, when such message has been sent by tedge-mapper-c8y, tedge-mapper-c8y resets the timer of sending heartbeat messages.

Additional context Note: Updating service status is NOT considered an availability update event

Updates to the device itself (with a given ID), in the form of empty PUT requests or requests with an ID only, that is {} or {"id": ... }

For example, an empty inventory update message should be as below.

tedge mqtt pub c8y/inventory/managedObjects/update/{{xid}} '{}'

Discussion: What "available" means for main and child devices? For the main device, we should consider that tedge-agent is the lead service which defines the main device is "available".

For child devices, since tedge-agent is recommended to run there, we can also consider that tedge-agent is the key of the availability. However, since it's not mandatory to install tedge-agent in child devices, we should give a way to users that which service defines that the device is available.

Insight: c8y_Availability There are several ways to set c8y_RequiredAvailability: SmartREST, JSON over MQTT, and REST. The server side behaviour is different using SmartREST or others.

The SmartREST 117 message will be only set if no value exists. It doesn't support update. The other ways updates the inventory whether it's set previously or not.

Even for JSON over MQTT, thin-edge has 3 options to use it; Publishing directly to c8y/inventory/managedObjects/update/{{xid}}, using our twin topic, or using inventory.json feature.

  1. Direct JSON over MQTT topic: c8y/inventory/managedObjects/update/{{xid}} payload: {"c8y_RequiredAvailability":{"responseInterval":45}}

  2. "twin" topic topic: te/device/{{xid}}///twin/c8y_RequiredAvailavility payload: {"responseInterval":88}

  3. inventory.json described here

rina23q avatar Apr 24 '24 15:04 rina23q

Should we also have a config flag to disable the "fake" heartbeat messages auto-generated by the mapper? The purpose of the availability monitoring for a device may not just be about monitoring if the device is connected and able to send some data, but could be about monitoring if it's really doing what the device is meant to do (generating the real telemetry data), right? So, in such cases, keeping the availability alive by sending those fake updates might lead to the admin missing any fatal issue with the core function of the device.

albinsuresh avatar May 06 '24 11:05 albinsuresh

We can add a boolean setting to control where this feature is controlled by the device or not by using: c8y.availability.enable (see below). Then the user can control availability period purely from the cloud, and rely on the telemetry data being published to control the availability status.

However the main driver of the availability flag is more about the service status rather than if telemetry data can be sent (e.g. measurements/events)...but Rina's proposal (along with the enable setting) should cover most use-cases.

[c8y.availability]
period = 60     # Required interval, and period of the heartbeat (if enabled = true)
enable = true   # disable this feature all together, don't send a 117 message, and disable the heartbeat

reubenmiller avatar May 07 '24 14:05 reubenmiller

@reubenmiller

[c8y.availability]
period = 60     # Required interval, and period of the heartbeat (if enabled = true)
enable = true   # disable this feature all together, don't send a 117 message, and disable the heartbeat

What should be the default values for these? The example above?

rina23q avatar May 08 '24 17:05 rina23q

@reubenmiller

[c8y.availability]

period = 60 # Required interval, and period of the heartbeat (if enabled = true)

enable = true # disable this feature all together, don't send a 117 message, and disable the heartbeat

What should be the default values for these? The example above?

Yeah default of 60 mins and true (enabled) are good. These setting shouldn't put too much undue strain on a system and should not send too much additional data over the network.

reubenmiller avatar May 08 '24 18:05 reubenmiller

Resolved by https://github.com/thin-edge/thin-edge.io/pull/2940

rina23q avatar Jul 04 '24 15:07 rina23q

All QA tasks are done during development and tested/comented in the resolving PR, throughly checked for regression. Eventual increase of test coverege will be done in separate PR

gligorisaev avatar Jul 05 '24 05:07 gligorisaev