helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] Alertmanager does not update secret with custom configuration options
Describe the bug a clear and concise description of what the bug is.
I want to customize the alertmanager configuration in the chart. If the alertmanager.config block is passed in the values files when first installing the chart, the alertmanager pod is not created. But if this block is omitted, the pod is created and I can kubectl port-forward into the pod.
However, the configuration file is mounted from what seems to be an automatically generated secret which contains the config block from the chart's default values:
...
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
...
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-prometheus-community-kube-alertmanager-generated
Next, if I try to upgrade the chart to include the configuration, a new secret is created named alertmanager-prometheus-community-kube-alertmanager but the pod has its configuration mounted from the auto generated secret.
What's your helm version?
version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}
What's your kubectl version?
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"56709e92afa973c26fad3d4a44723fefa51481b7", GitTreeState:"clean", BuildDate:"2022-03-10T07:59:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
kube-prometheus-stack
What's the chart version?
kube-prometheus-stack-34.9.1
What happened?
The alertmanager pod mounts a configuration file from a secret that contains the default configuration values for the alertmanager and when updated, a new secret is created but this is not mounted into the pod
What you expected to happen?
The secret in which the configuration file in /etc/alertmanger/config/alertmanager.yaml is mounted from should be updated to contain the configuration passed in the values file when upgrading the chart
How to reproduce it?
- Install the chart with helm like you normally would (with or without the values file)
- Check the auto generated secret:
kubectl get secrets -n monitoring - Upgrade the helm chart with the command below to include the configuration block from the values file
- Check the secrets again to see that a new secret is created but nothing is changed
- Open a shell session in the alertmanager pod to inspect the config file in
/etc/alertmanager/config/alertmanager.yamland see that it contains the default values.
Enter the changed values of values.yaml?
alertmanager:
config:
global:
resolve_timeout: 5m
route:
group_by: ['job', 'alertname', 'priority']
group_wait: 10s
group_interval: 1m
routes:
- match:
alertname: Watchdog
receiver: 'null'
- receiver: 'slack-notifications'
continue: true
receivers:
- name: 'slack-notifications'
slack-configs:
- slack_api_url: <url here>
title: '{{ .Status }} ({{ .Alerts.Firing | len }}): {{ .GroupLabels.SortedPairs.Values | join " " }}'
text: '<!channel> {{ .CommonAnnotations.summary }}'
channel: '#mychannel'
Enter the command that you execute and failing/misfunctioning.
helm upgrade -i prometheus-community prometheus-community/kube-prometheus-stack -n monitoring -f path/to/values.yaml
Anything else we need to know?
No response
+1
+1
+1
+1
Please, tell me the working version of chart
+1
This is blocking our production deployment as we can't deploy the alerts we need
I have found the problem to be a misconfiguration in the alerting rules so the config.y'all can't load properly so alerts are not created. My recommendation is to carefully review the config file
You nailed it - helm was silently mangling the files
#!/bin/bash
helm template -n monitoring monitoring -f values.yaml -f values-ops.yaml . --show-only charts/kube-prometheus-stack/templates/alertmanager/secret.yaml | yq -r '.data."alertmanager.yaml" | @base64d' >am-ops.config helm template -n monitoring monitoring -f values.yaml -f values-prod.yaml -f secret://values-prod-secrets.enc.yaml . --show-only charts/kube-prometheus-stack/templates/alertmanager/secret.yaml | yq -r '.data."alertmanager.yaml" | @base64d' >am-prod.config diff am-prod.config am-ops.config ~/go/bin/amtool config routes test --config.file=am-prod.config --tree ~/go/bin/amtool config routes test --config.file=am-prod.config --tree alertname="TSDB Sync failed: missing WAL file" ~/go/bin/amtool config routes test --config.file=am-ops.config --tree ~/go/bin/amtool config routes test --config.file=am-ops.config --tree alertname="TSDB Sync failed: missing WAL file"
On Wed, May 25, 2022, 2:09 PM Everton Spader @.***> wrote:
I have found the problem to be a misconfiguration in the alerting rules so the config.y'all can't load properly so alerts are not created. My recommendation is to carefully review the config file
— Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/1998#issuecomment-1137751253, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWS5W5TBLBVTK6PHID3VLZ3GTANCNFSM5TY2XP4A . You are receiving this because you are subscribed to this thread.Message ID: @.***>
After spending many hours on this, I got these learnings:
- Updating an existing helm deployment will not update the configuration of the alertmanager. You havo to uninstall and reinstall the helm chart.
- Using a faulty
alertmanager.configfield will not create the alertmanager pod. Start with a working config and iterate on that. - You can obtain a working value of
alertmanager.configby installing the helm chart without thealertmanager.configset, port-forwarding the alertmanager pod which will be created with a default config and using the displayed config inhttp://localhost:9093/#/statusas youralertmanager.config. (SSHing into the pod and finding it there also works) - From this working setup you can iteratively update the config, uninstall the chart and reinstall the chart. 😅
For this I was using kube-prometheus-stack-36.1.0
Still getting this issue on 36.2.0
A workaround I've found is to move the config into a custom resource definition, and reference that using alertmanager.alertmanagerSpec.alertmanagerConfiguration.
Example:
# receiver-config.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-config
namespace: monitoring
spec:
receivers:
- name: '<receiver name>'
webhookConfigs:
- url: '<webhook url>'
route:
receiver: '<receiver name>'
# values.yaml
alertmanager:
alertmanagerSpec:
alertmanagerConfiguration:
name: alertmanager-config
...
This seems to work as expected, but now I wonder how alertmanager.config is even meant to be used.
I just had the same problem here and, possibly, found a solution for it. Notice that you use the 'null' receiver at the Watchdog route match, but there is no definition for the null receiver. By adding an empty receiver named 'null', just like the default values does, my config was updated successfully. Here's an example:
alertmanager:
enabled: true
config:
global:
slack_api_url: <URL>
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'slack'
routes:
- match:
alertname: Watchdog
receiver: 'null'
receivers:
- name: 'null'
- name: 'slack'
slack_configs:
- channel: '#alerts'
text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}'
Maybe the real problem here is that there is no error message or feedback that tells your configuration is invalid, or missing some information. The error just goes silent.
+1, I wasn't able to find who was in charge of this secret generation… Having something at UI level or a log to inform us the provided configuration is not correct would be very cool.
Just encountered the issue and apparently it goes something like this:
When you create or update an alertmanager config secret, prometheus operator notices it and attempts to validate it using crds, specifically alertmanagerconfigs.monitoring.coreos.com.
If validation succeeds - the generated alermanager secret is updated/created. This is the secret that is actually mounted in the alertmanager pod, not the one you modify with a chart.
If validation fails, prometheus operator writes a console log explaining what it didn't like in your config and does nothing. Somewhat sneaky to my liking, but oh well.
tldr: If you want to know why your alertmanager secret is not upated - check prometheus operator logs for errors.
ps: be aware, that apparently alertmanager config crd is way behind the official documentation and is lacking some fields. For example, telegram_configs was only recently added (in may or june) and crd still misses time_intervals and active_time_intervals objects.
A workaround I've found is to move the config into a custom resource definition, and reference that using
alertmanager.alertmanagerSpec.alertmanagerConfiguration.Example:
# receiver-config.yaml apiVersion: monitoring.coreos.com/v1alpha1 kind: AlertmanagerConfig metadata: name: alertmanager-config namespace: monitoring spec: receivers: - name: '<receiver name>' webhookConfigs: - url: '<webhook url>' route: receiver: '<receiver name>'# values.yaml alertmanager: alertmanagerSpec: alertmanagerConfiguration: name: alertmanager-config ...This seems to work as expected, but now I wonder how
alertmanager.configis even meant to be used.
@YuKitsune would you mind to share your full config? I am getting the following on the logs:
evel=warn ts=2022-08-06T16:06:17.533402697Z caller=operator.go:1091 component=alertmanageroperator msg="skipping alertmanagerconfig" error="unable to get secret \"\": resource name may not be empty" alertmanagerconfig=monitoring/alertmanager-config-override namespace=monitoring alertmanager=promstack-alertmanager
This is my AlertmanagerConfig:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-config-override
namespace: monitoring
spec:
route:
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'null'
routes:
- receiver: 'null'
matchers:
- name: alertname
matchType: '=~'
value: "InfoInhibitor|Watchdog"
receivers:
- name: 'null'
and my values.yml:
alertmanager:
enabled: true
config:
global:
resolve_timeout: 5m
templates:
- '/etc/alertmanager/config/*.tmpl'
alertmanagerSpec:
alertmanagerConfiguration:
name: alertmanager-config-override
@jsalatiel That should work. What I've found is that you need to install the chart first so that he CRDs get added, apply the custom AlertmanagerConfig, then re-install/upgrade the helm chart so it picks up the custom AlertmanagerConfig.
I'm still relatively new to Helm and K8s, so I might be doing it in a roundabout way, but that's what I've found works...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
Had same issue, turned out to get fixed by deleting alertmanager pod and all the values were added automatic to alertmanager.config
Had same issue, turned out to get fixed by deleting alertmanager pod and all the values were added automatic to alertmanager.config
Can you share your configuration? I deleted the pod and did not update the configuration of the pod alertmanager.
Had same issue, turned out to get fixed by deleting alertmanager pod and all the values were added automatic to alertmanager.config
Can you share your configuration? I deleted the pod and did not update the configuration of the pod alertmanager.
Unfortunately dont have it anymore.
Could you please reopen this ticket ? I have the same issue today.
We're encountering this issue repeatedly, would be nice if there was a better solution than "Uninstall and reinstall the chart" which in production is a tad heavy-handed...
The core of the issue is: when creating the chart, the secret is created as a pre-hook resource, and is therefore not part of the helm chart proper. Therefore if you run upgrade on your chart, it won't try to recreate the secret..
However, if you delete the secret and run upgrade on your chart, the secret will get recreated with the proper values. To be clear, this is quite annoying 😢
While there are other workarounds (such as using an external secret), I wouldn't consider this issue closed, as it is very much present 😅