sloth icon indicating copy to clipboard operation
sloth copied to clipboard

variable grouping

Open cznewt opened this issue 2 years ago • 7 comments

Is it possible to make aggregations by (var) like for server availability to have hostname as grouping key?

So we can calculate SLO per server? I have done update in queries, the boards not cope with it well, but I'm curious if this can work conceptually, I can update the boards to handle grouping vars.

cznewt avatar Mar 28 '22 08:03 cznewt

Sorry for the Spam @cznewt, I already reported to Github.

slok avatar Mar 28 '22 10:03 slok

Thank you @slok did you think about having a grouping by variable? nodes/handlers/services I'm testing it now, i can let you know how it works with current setup in a few days.

cznewt avatar Mar 28 '22 13:03 cznewt

Yes, that's something that happens at the Prometheus level. You need to have the correct queries. In the past, I had some custom dashboards for the grouping of some SLOs that I knew they exists.

The problem is the discovery of the unknown grouping labels, and having them on the generic dashboards.

Let me know how it goes, maybe we should add a section on the docs for this purpose or use case.

Thanks, @cznewt!

slok avatar Mar 28 '22 16:03 slok

Guys, I have the same need, for example, I have 100 microservices, on 5 environments, which generate +- 10k of rules. I need some features for generating fewer rules.

For example doesn't specify my service, using one by service_name on "istio". or using labels from istio, for example, node.

templarfelix avatar Apr 20 '22 13:04 templarfelix

Hey everyone, I would also be interested in a solution for this. We have lots of Kafka Clusters, each in their own namespace. Would be great to be able to filter by namespace. I'm using the PrometheusServiceLevel CRD to define queries. I added "by (namespace)" to each of my queries but the SLOTH metrics like slo:objective:ratio do not contain the namespace -> can't use it in Grafana. Is there a solution for this?

Edit: adding "by (namespace)" works now, don't know what went wrong the first time ...

bthdimension avatar Oct 10 '22 11:10 bthdimension

Hi @bthdimension!

There is no solution yet, however, I'm already thinking about how to tackle this problem, without being complex for the user.

slok avatar Oct 28 '22 07:10 slok

To add to the discussion, having a way to keep a Prometheus label as part of the aggregation would also help with keeping some extra labels in alerts.

For example I have some "environment" or "tenant_id" labels that I would like to keep in the alerts for routing purpose without having to explicitly set them in each alert definition.

jleloup avatar Sep 05 '23 23:09 jleloup