prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

way to disable Managed Alertmanager?

Open parkedwards opened this issue 3 years ago • 14 comments

hello there - if we opt to self-deploy Alertmanager with GMP, is there a way to disable the automatically created alertmanager deployment / service?

https://cloud.google.com/stackdriver/docs/managed-prometheus/rules-managed#self-deployed_alertmanager

parkedwards avatar Oct 28 '22 20:10 parkedwards

Hi @parkedwards, by default the managed Alertmanager doesn't do anything or interfere with self-deployed Alertmanagers. So simply by not configuring it, it is essentially "disabled". If you want to go a step further, you could try scaling the MAM statefulset to 0 replicas, but this may only work if you are running GMP unmanaged.

For our reference, could you share why you need to disable the managed Alertmanager? Thanks

damemi avatar Oct 31 '22 15:10 damemi

@damemi makes sense, we'll leave the MAM set un-configured

For our reference, could you share why you need to disable the managed Alertmanager? Thanks

Sure thing, so we're opting to self deploy our Alertmanager instances, but otherwise still used the managed GMP components (collectors, rule-evaluator, etc.). Ideally, we wouldn't be running any other Alertmanager deployment or statefulset (eg. the Managed ones), just to conserve on resource usage and reduce confusion for anyone else on the team

parkedwards avatar Oct 31 '22 17:10 parkedwards

It would be nice to be able to disable managed AlertManager to avoid confusion.

robmonct avatar Jan 26 '23 14:01 robmonct

We just upgraded to v0.5.0, and spent half a hour figuring out a way to disable the managed alertmanager.

Why?

  • We don't need it.
  • It adds confusion; there is already an alertmanager instance running in the cluster
  • It failed to deploy due to kustomization patches intended for the "real" alertmanager instance.

How?

We have to deploy GMP via the manifests, the addon installation isn't flexible enough for our needs (we need istio-sidecars, and we don't allow manual configuration through kubectl: everything should be code). Since there isn't a Helm chart for GMP, we have to use kustomize. It's fairly straightforward to configure GMP to suit your needs via kustomizations:

# ./kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.5.0/manifests/setup.yaml
  - https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.5.0/manifests/operator.yaml


patchesStrategicMerge:
  - delete-google-managed-alertmanager.yaml

patches:
  - target:
      name: config
      kind: OperatorConfig
    patch: |-
      # Connect GMP with our self-managed alertmanager 
      - op: add
        path: /rules
        value:
          alerting:
            alertmanagers:
            - name: alertmanager
              namespace: monitoring
              port: 9093

  - target:
      name: gmp-system
      kind: Namespace
    patch: |-
      # Add Istio sidecars to the GMP so Kiali graphs make sense
      - op: add
        path: /metadata/labels
        value:
          istio-injection: enabled

---

# ./delete-google-managed-alertmanager.yaml

$patch: delete
apiVersion: v1
kind: Service
metadata:
  namespace: gmp-system
  name: alertmanager
---
$patch: delete
apiVersion: v1
kind: Secret
metadata:
  namespace: gmp-system
  name: alertmanager
---
$patch: delete
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: gmp-system
  name: alertmanager

djfinnoy avatar Apr 11 '23 09:04 djfinnoy

Nice, and your workaround makes sense! Also you are welcome to use newer version of operator (e.g. gke.gcr.io/prometheus-engine/operator:v0.6.3-gke.0 and v0.6.3-rc.0 tag).

We will think with the team if there is a way for us to have easier way of disabling AM for your use cases.

bwplotka avatar Apr 13 '23 09:04 bwplotka

Hi @bwplotka, Do we have any updates on this topic?

arthurburle avatar Sep 19 '23 15:09 arthurburle

No discussion yet, sorry for lag.

From my understanding this feature only applies to managed GMP.

One way of solving it is a new field e.g. "OperatorConfig.ManagedAlertmanagerSpec.Disabled = true that would change Alertmanager replica to 0 from 1.

The addition work on our side is to fix alerting for this case (for managed GMP we have solid SLOs).

Before we prioritise this we would love to know what's the confusion argument that was meant here. Sounds like it's the only reason for this feature. The confusion comes from "alertmanager" named pod running in some system namespace (it's filtered out by default though) when listing pods. Or is it some other source of confusion?

bwplotka avatar Sep 25 '23 12:09 bwplotka

Note: @bernot-dev works on this feature (automatic disabling of AM and rule-eval if no configuration is used for those) 🤗

bwplotka avatar Nov 29 '23 08:11 bwplotka

Solution implemented in #691. Rule-evaluator and alertmanager will scale to zero when there are no Rules set up in the cluster.

bernot-dev avatar Jan 19 '24 21:01 bernot-dev

Hi team, Thanks for your work, but I have to say that the solution implemented doesn't have a lot of sense, to me. The purpose of the issue, if I'm not wrong, is to use a self-deployed alertmanager, so there will be rules. With this solution, we will have the same problem. In my opinion the solution should be just especify a way to enable or disable, independently of the amount of rules.

robmonct avatar Jan 23 '24 10:01 robmonct

To be more specific, it scales the GMP Rule-evaluator Deployment and Alertmanager StatefulSet to zero if none of these custom resources exist:

monitoring.googleapis.com/ClusterRules
monitoring.googleapis.com/GlobalRules
monitoring.googleapis.com/Rules

The primary goal of #691 was saving resources when the user does not need those pods running. If a user wants their own self-deployed Alertmanager, the GMP Alertmanager should not interfere unless it is also using our specific custom resources.

bernot-dev avatar Jan 23 '24 16:01 bernot-dev

well to @robmonct's point, which mirrors our use case -- we're using all of the Managed Prometheus components (rule evaluator, collector, etc.) which includes usage of the Rule CRDs. We just want to use our own Alertmanager instance

parkedwards avatar Feb 21 '24 14:02 parkedwards

+1 for a proper solution. The managed alertmanager isn't fitting for us anymore, as we require an AlertmanagerConfig pendant for our different application teams (but also due to #685). Nevertheless, we want to use the remaining GMP parts like collectors, rule-evaluator, etc.

The managed alertmanager is just eating cluster resources. While it's not a lot, I'd prefer to not have useless pods in our clusters.

m3adow avatar Jul 03 '24 05:07 m3adow

Hey @m3adow - I'll re-open this issue so we can discuss as a team how we want to address and prioritize this.

pintohutch avatar Jul 03 '24 23:07 pintohutch