fiware-orion icon indicating copy to clipboard operation
fiware-orion copied to clipboard

Alarm for notification queue overpassing a given threshold

Open fgalan opened this issue 2 years ago • 2 comments

Is your feature request related to a problem / use case? Please describe.

Orion is able to log in the case of notification queue is full (in threadpool notification model), either for the general queue:

Runtime Error (default notification queue is full)

or for per-service queues, if that functionality is in use:

Runtime Error (serv1 notification queue is full)

Thus, operation teams know about the queue is getting saturated when is already to late and notifications are being dropped.

Describe the solution you'd like

Implement a new alarm, this way:

Alarm ID Severity Detection strategy Stop condition Description Action
8 WARNING The following WARN text appears in the 'msg' field: "Raising alarm NotificaitonQueue <service>": <detail>". The following WARN text appears in the 'msg' field: "Releasing alarm NotificaitonQueue <service>", where <service> is the same one that triggered the alarm. Orion prints this trace when notification queue goes back below the threshold. The notification queue associated to the service (or <service> "default" for default queue) has overpassed the alarm threshold. The <detail> text described the particular threshold. No specific action has to be performed at Orion Context Broker service, but the update flow causing the notification on that service (or default queue) should be lowered in order to reduce pressure on queue. Another possible problem is due to malfunctioning notification receivers, if they are slow processing notifications and responding to Orion.

Things to decide:

  • How many thresholds? For instance >80% is critical, 50-80% is moderate. However, the current raise-release mechanism gets complicated if more than one level is defined, so probably the simpler approach is just one threshold.
  • How to specify the threshold(s)? Several approaches:
    • Hardwired in Orion code (simpler)
    • In the Orion CLI (a bit more flexible and also pretty simple)
    • Dynamic, through Orion admin API (more flexible, but also more complex)

Describe alternatives you've considered

None so far

Describe why you need this feature

It would be useful for the operation teams using Orion, so they can define alarms based in Orion logs.

fgalan avatar May 05 '22 16:05 fgalan

Hi @fgalan sir,

I would like to work on this issue, As per my understanding we need to add an alarm "NotificaitonQueue" when it overpassed the threshold.

How many thresholds? For instance >80% is critical, 50-80% is moderate. However, the current raise-release mechanism gets >complicated if more than one level is defined, so probably the simpler approach is just one threshold.

We need to specify only one threshold value for that we can hardwired the threshold value in Orion code or we can add CLI for that.

Please confirm my understanding.

Anjali-NEC avatar May 15 '23 13:05 Anjali-NEC

I think your understanding is correct. Thanks!

fgalan avatar Jun 01 '23 14:06 fgalan