prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

No ui or any visualization for self-deployed rule-evaluator rules

Open DavidMij opened this issue 2 years ago • 15 comments

Hi all, i'm using self-deploy rule evaluator on my GKE cluster, how can i see all the rules -> records\alerts that i configured in the rules.yaml ?

i was able to configure the alertmanager so i can see my fired alerts, but that is not even close to what im expecting to see like the 'Alerts' or 'Rules' tab that i have in prometheus server.

is there any way to integrate between rule-evaluator and promehteus-server so i can see all of my rules, or is there any way to watch those unfired Alerts ?

DavidMij avatar Jul 27 '22 10:07 DavidMij

We don't have this... but it's a very good idea. I'll add this towards the top of our backlog - specifically a way to query the rule-evaluator and see what rules are installed and what the current state is, like the /alerts endpoint.

lyanco avatar Jul 27 '22 21:07 lyanco

That will be amazing thank you very much.

DavidMij avatar Jul 28 '22 07:07 DavidMij

We need it too. This will be great!

ImryLevySadan avatar Jul 28 '22 07:07 ImryLevySadan

hi @lyanco, any updates?

DavidMij avatar Aug 15 '22 12:08 DavidMij

This isn't expected soon. Will post here when we start working on it.

Note that an analogue - the targets status page - is currently in review at https://github.com/GoogleCloudPlatform/prometheus-engine/pull/301

lyanco avatar Aug 15 '22 13:08 lyanco

Agreed that having a streamlined solution here would be nice. Out of curiosity, what particular info from the /rules and /alerts endpoints are of interest?

Note that for pending or firing alerts, Prometheus creates synthetic time series ALERTS that are queryable alongside your regularly-ingested metrics.

pintohutch avatar Aug 30 '22 01:08 pintohutch

@pintohutch for me, the most helpful info from those page routes is the status of the deployed rules -- we'd want to make sure that rules are correctly registered at the evaluator layer after frequent deploys. in general, we want to be certain that we're not alerted bc of good reasons + not bc our alerting rule didn't deploy properly

parkedwards avatar Nov 01 '22 01:11 parkedwards

Thanks @parkedwards - that's helpful context. We'll keep this issue open as we iterate.

pintohutch avatar Nov 04 '22 14:11 pintohutch

no doubt @pintohutch !

one possible short-term way to support this ask is possibly to expose the /api/v1/rules API from one of the managed GMP components (rule-evaluator seems to make the most sense)

that way, users who are running grafana in front of their GMP cluster can configure a Grafana data source that points to this endpoint, so that the rules can show up in the UI there (as well as their firing states)

parkedwards avatar Nov 09 '22 23:11 parkedwards

@pintohutch Following up on this, any luck/progress with this issue? Thanks

pandrian avatar Jun 20 '23 12:06 pandrian

No movement so far. We've been focusing on delivering a fully-managed, Cloud-based alerting flow (PromQL through Cloud Alerting). This will accept rule_files format and execute them without any infrastructure running in your cluster. It also will have a rich UI to view, edit, and see the status of alerts, which should more fully solve the problems listed here.

Note that the current alerting flow isn't going to be deprecated or anything... we just expect most people to move to the fully cloud-based solution over time as it's more in line with expectations.

If you're interested in joining the preview, you can do so by filling out this form: https://docs.google.com/forms/d/e/1FAIpQLSf7XlkxiDhnlqKDtdJAOwbimGUskEP_Q2V2zqleMCP5m4C_Bg/viewform

lyanco avatar Jun 20 '23 14:06 lyanco

Thanks @lyanco I guess in our case we are not using the fully-managed prometheus because we scrape GCE instances. would this be supported for self-deployed versions as well?

pandrian avatar Jun 21 '23 09:06 pandrian

PromQL in Cloud Alerting should work for all environments and all clouds, yes.

FWIW If you're using self-deployed collection, the /rules and /alerts APIs should be functional on it, but it will only show those rules and alerts that are installed in the local server and executing against local data. It won't show any rules/alerts executing against Monarch, but this might suffice for your use case.

lyanco avatar Jun 21 '23 14:06 lyanco

Thanks @lyanco yeah we are executing the rules against monarch with rule-evaluator its a bit of a hassle to understand which alerts are registered and troubleshoot them. Im more interested for this functionality if possible.

no doubt @pintohutch !

one possible short-term way to support this ask is possibly to expose the /api/v1/rules API from one of the managed GMP components (rule-evaluator seems to make the most sense)

that way, users who are running grafana in front of their GMP cluster can configure a Grafana data source that points to this endpoint, so that the rules can show up in the UI there (as well as their firing states)

pandrian avatar Jun 21 '23 15:06 pandrian

Hey @pandrian,

Aside from leaning into the Cloud Alerting PromQL preview Lee mentioned, you can manually pull Prometheus metrics from the rule-evaluator's metrics endpoint (example) to get better signal of what it's doing with your configured Rules.

This is obviously not as nice as the Prometheus /rules API, but evaluating these along with any of its logs can be a helpful sanity check.

pintohutch avatar Jun 21 '23 21:06 pintohutch