tokheim comments

Results 10 comments of


                                            tokheim

Support "window" based SLOs

I believe you are describing the timeslices budgeting method in https://github.com/OpenSLO/OpenSLO.

Current burning budget, Remaining error budget(30d window, Remaining error budget (month)) are NaN

There have been some similar questions in the past. What I suspect happens is that in some 5 minute periods you have no incoming requests at all, so your divisor...

ALB target group tag selections failing due to resource id

MSK seems to have similar issue with mismatching arn id. As seen in https://docs.aws.amazon.com/msk/latest/developerguide/msk-create-cluster.html a cluster is given a arn id like `"arn:aws:kafka:us-east-1:123456789012:cluster/CustomConfigExampleCluster/abcd1234-abcd-dcba-4321-a1b2abcd9f9f-2"`. The random characters postfixed in the arn...

Test docker build job

So I hope to experiment a bit on how we could make use of this. But we would likely want to run it in kubernetes and want some way to...

Docker image

Any update on this?

Time window in metric name

Just want to offer you a partial solution that might help. Metric names are actually treated as labels in prometheus. So you can do an expression like ``` {__name__=~"slo:sli_error:ratio_rate.*",sloth_service="my-service", sloth_slo="my-slo"}...

Help on Latency SLO definition

The normal approach is to consider any request that doesn't meet latency target as a straight up error. This would mean you should use `le` approach, and tailor `le` to...

NaN in SLO dashboard

So root issue is probably that with zero traffic `errorQuery/totalQuery` evaluates to `NaN` especially for the short 5min window size. Unless you use the feature in #241 `slo:period_error_budget_remaining:ratio` is just...

NaN in SLO dashboard

Your updated Sloth spec looks correct, though I don't have access to a prometheus server to test the query at the moment. Dashboard might still show `NaN` until window period...

How can one add a weekly maintenance window into the calculations for SLO's with sloth?

At least you first need prometheus to record maintenance windows. Either some system that reports this as metric, or if its a fixed time, you could build a recording rule...