sloth
sloth copied to clipboard
Support "window" based SLOs
Currently Sloth supports defining SLOs as a percentage of failures over a period, i.e. as the ratio of the number of good requests to the total number of requests.
It'd be great if Sloth also supported defining alerts/rules out-of-the-box for "window-based SLOs" (a term I've only found coined here: https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring#slo-types).
i.e. SLOs which divide the overall window into equal time ranges (I assume with width of the prometheus rule evaluation interval), and then marks each time range as a success or failure
This is currently possible using "Raw" SLOs and Prometheus subqueries, but it'd be preferred if Sloth generated the underlying rules to remove the need for subqueries,
e.g. right now I believe this would be the correct configuration for the SLI "throughput over 60 requests per second" for each 1m window, making using a subquery:
error_query: 'avg_over_time((sum(rate(http_requests_total{job="some-job"}[2m])) > bool 60)[{{.window}}:1m])'
Alternatively if this already fits into the Sloth model without plugins I'd be happy to hear suggestions!
I believe you are describing the timeslices budgeting method in https://github.com/OpenSLO/OpenSLO.