argocd-notifications
argocd-notifications copied to clipboard
Addition of new trigger: on-health-healthy
Summary
Introducing new trigger: on-health-healthy
Use Cases
When using HPA and deploying apps, there's a short period of time until HPA collects metrics and the app is fully healthy and in green.
Meanwhile, notifications about degraded health are sent to notification channels and there's no new notification once HPA is healthy and the whole application is in green. This leaves the notification observer wondering if an app is entire time degraded when in reality degraded state lasted for about 10 seconds.
This new trigger should accompany on-health-degraded
trigger and better represent situation in clusters.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
Hello! I have a similar behavior: after approximately 10 seconds when the application has a Healthy status I will get a notification that the application changed status from Healthy -> Degraded without notification when the application will get Healthy status again.
In ArgoCD Application Controller debug logs I've found the root cause of that:
The HPA was unable to compute the replica count: did not receive metrics for any ready pods.
And this is a logical behavior for HPA. But, I think, ArgoCD Notifications is not handling this behavior of HPA that's why we will get redundant notifications in this order:
-
Progressing -> Healthy
- the first notification -
Healthy -> Degraded (approximately after 10 seconds)
- the second notification
I'm using these triggers to get notifications:
triggers:
trigger.on-deployed: |
- when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy'
oncePer: app.status.operationState.syncResult.revision
send: [app-deployed]
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [app-health-degraded]
Also, checked current behavior on ArgoCD Notifications v1.1.0 and v1.1.1. Everywhere is the same.
My expectations: when the application is using HPA controller then ArgoCD Notifications must wait before the controller will get necessary metrics from Metrics API (from Metrics Server component) and only after that to send a notification with a Healthy status. But mostly, I think. these changes must be done on ArgoCD side, not ArgoCD Notifications. Here is a related issue from ArgoCD with a workaround: https://github.com/argoproj/argo-cd/issues/6287
If my expectations are incorrect, please give me know)
I'm fine with notifications on degraded getting sent but would want healthy status notification right away when the app is healthy again, regardless of how quickly sent - if it's after 10 seconds, that's fine to me.
BUT, if ArgoCD can somehow poll HPA status and wait untill app is healthy, that would be even better.
@Zava2012 wouldn't hurt adding a 👍🏼 on the issue to move it up in prioritization a bit. :)
would this trigger condition prevent sending the false alert? i.e wait 2 mins before sending degraded notifications
trigger.on-health-degraded: |
- description: Application has degraded
oncePer: app.status.sync.revision
send:
- app-health-degraded
when: app.status.health.status == 'Degraded' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 2
@Zava2012 wouldn't hurt adding a 👍🏼 on the issue to move it up in prioritization a bit. :)
Did it :) But I am more and more sure that changes must be done from ArgoCD side.
would this trigger condition prevent sending the false alert? i.e wait 2 mins before sending degraded notifications
trigger.on-health-degraded: | - description: Application has degraded oncePer: app.status.sync.revision send: - app-health-degraded when: app.status.health.status == 'Degraded' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 2
It looks like a soft workaround that can be used. Delay notification is better than incorrect notification, I think. I'll try and give feedback on how this expression works after trying. Thanks!
This event/trigger would also be nice when an application has been unhealthy for a while, and has recovered, or should that be another trigger (e.g. on-recovered) ?
We don't want to spam the developers with "everything is ok" slack messages, but if we only notify on failures it would be nice to notify when an app recovers as well.
This is similar to e.g. AlertManager which I am also familiar with which can send notifications on recovery.
To me personally, sending "deployment successful" messages to Slack is useful because that's how the team knows that their deployment went through successfully instead of them having to watch every single step of the pipeline individually.
Hi there, I would be interested to know if this issue has been planned for a future release and in general what is its priority (I noticed that the latest comment is from December 2021). We have been recently discussing the very same situation at my company and that is the sort of scenario in which we would not mind to see a resolved message/"now healthy" coming back to us. In a nutshell, when the app recovers, it would be nice to be notified. Thanks
I also have the same requirement as @ilacorda and would like to notify when an Application changes from degraded to healthy