tournesol
tournesol copied to clipboard
[infra] fix: Reduce alert spam from slow request
The target for alerts is that when one occurs we are compelled to do something.
This is not the case for the slow request alert.
Having a tab open on Grafana generates continuously slow requests, and can trigger the alert every 5 minutes.
Possible improvements:
- [ ] Ignore slow requests from Grafana
- [ ] Improve performance for the requests that Grafana uses
- [ ] Trigger alert with a higher threshold (maybe 1s ?)
- [ ] Trigger the alert at most once per hour
Please propose additional improvements if better than the proposed list. This ticket is closed after doing the minimum work that significantly reduces the alerting spam
I think that ignoring requests to Grafana for latency alerts would make sense. Alerting should focus primarily on the issues that impact real users, and the current behavior could even add noise when investigating unrelated issues. Moreover, in the case when response times of Grafana is an actual problem, maintainers can probably realize that on their own when accessing the dashboards.
This solved itself with ongoing development and increaase usage of the platform