agent icon indicating copy to clipboard operation
agent copied to clipboard

Grafana Agent Operator - Support for the same features as Prometheus Operator

Open aengusrooneygrafana opened this issue 2 years ago • 4 comments

Following discussion with a number of users, it has been found that the Prometheus Operator and the Grafana Agent Operator are slightly out of feature alignment. The Prometheus Operator supports Alerts and Alerting, which the Grafana Agent Operator does not. This leads to a proliferation of tools required for maintenance of AM and Alerts outside of the operator.

This proposal is to request the same level of support for Alerting in the Grafana Agent Operator, as is supported in the Prometheus Operator. Ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/alerting.md

aengusrooneygrafana avatar Aug 03 '22 18:08 aengusrooneygrafana

This is similar to #523 where we discussed adding the ability to Grafana Agent to sync rules with the Cortex/Mimir ruler API.

It sounds reasonable to add that functionality into both the agent and the operator, though it's not going to be simple; we need to figure out how to identify rules that came from an agent so we know which should be added/removed/changed when reconciling the list.

rfratto avatar Aug 03 '22 19:08 rfratto

Prometheus Operator seems to store this info in the Prometheus CRs, at spec.{rule{,Namespace}Selector}.

While using different CRDs, VictoriaMetrics operator seems to have a alerts-specific VMAlert.spec.{rule{,Namespace}Selector} that describes where to select *Rule CRs from.

I assume to keep this as frictionless as possible, no new CRDs should be introduced, but Grafana Agent Operator could listen to Prometheus CRs?

Or, looking at https://github.com/grafana/agent/pull/1839 and similar proposals, Grafana Agent would watch PrometheusRule CRs, and filters for selectors/namespaces would be a config of Grafana Agent itself?

flokli avatar Aug 04 '22 04:08 flokli

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

github-actions[bot] avatar Sep 15 '22 00:09 github-actions[bot]

remove stale

azhurbilo avatar Sep 15 '22 18:09 azhurbilo

This would be a boon if this support existed in the agent.

jhohertz avatar Sep 30 '22 18:09 jhohertz

Hey all, support for recording and alerting rules is something still being considered. We don't have any updates right now.

With the recent Grafana Agent Flow announcement, it might make sense to support recording/alerting rules as Flow components, too.

rfratto avatar Sep 30 '22 18:09 rfratto

Awesome to hear. The flow features are cool, but the use case is to allow the rules to ship along side individual application as manifests vs. requiring a more central cluster-level configuration. :crossed_fingers: for this feature parity ticket. It would also make it very easy for people to transition to these products from their current state if using the prometheus operator. Your sales team would love it :wink:

jhohertz avatar Sep 30 '22 18:09 jhohertz

Sorry, if I wasn't clear, I meant we could have Flow components which could discover and consume alert/monitoring rule CRDs :) That would support the use case you just mentioned; provide PrometheusRule resources alongside applications and have Flow discover and act on them.

rfratto avatar Sep 30 '22 20:09 rfratto

With https://github.com/grafana/agent/issues/1544 complete via https://github.com/grafana/agent/pull/2604/ this is now missing a piece in grafana agent operator to make use of the new feature.

james-callahan avatar Feb 28 '23 03:02 james-callahan