sloth icon indicating copy to clipboard operation
sloth copied to clipboard

Support "window" based SLOs

Open Limess opened this issue 2 years ago • 1 comments

Currently Sloth supports defining SLOs as a percentage of failures over a period, i.e. as the ratio of the number of good requests to the total number of requests.

It'd be great if Sloth also supported defining alerts/rules out-of-the-box for "window-based SLOs" (a term I've only found coined here: https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring#slo-types).

i.e. SLOs which divide the overall window into equal time ranges (I assume with width of the prometheus rule evaluation interval), and then marks each time range as a success or failure

This is currently possible using "Raw" SLOs and Prometheus subqueries, but it'd be preferred if Sloth generated the underlying rules to remove the need for subqueries,

e.g. right now I believe this would be the correct configuration for the SLI "throughput over 60 requests per second" for each 1m window, making using a subquery:

error_query: 'avg_over_time((sum(rate(http_requests_total{job="some-job"}[2m])) > bool 60)[{{.window}}:1m])'

Alternatively if this already fits into the Sloth model without plugins I'd be happy to hear suggestions!

Limess avatar Apr 29 '22 10:04 Limess

I believe you are describing the timeslices budgeting method in https://github.com/OpenSLO/OpenSLO.

tokheim avatar May 12 '22 12:05 tokheim