alertmanager
alertmanager copied to clipboard
Regression in 0.26.0 when using `amtool config rules` with `webhook_configs`
What did you do?
Update from version 0.25.0
to 0.26.0
.
What did you expect to see?
We use amtool config routes test alertname=Foobar severity=panic
in an automated test to ensure there is no misconfiguration that would not route alerts into on-call. We expected the command to work as before in the sense, that it would tell us which receivers would receive that alert.
What did you see instead? Under which circumstances?
With amtool 0.26.0
:
error: scheme required for webhook url
With amtool 0.25.0
(error message is different, but the problem is the same):
Warning: amtool version (0.25.0) and alertmanager version (0.26.0) are different.
amtool: error: unsupported scheme "" for URL
amtool exits with status code 1
regardless and does not print the matched routes.
This is very likely a regression from https://github.com/prometheus/alertmanager/pull/3228 as the API does not output the url anymore, but amtool seems to be unaware of it and detects an error.
It also affects the amtool config routes show
command.
Alertmanager configuration file
This probably only happens when using a webhook_configs
with a url
:
global: { } # ...
route: { } # ...
receivers:
- name: oncall-alerts
webhook_configs:
- url: http://example.com/webhook
templates: [] # ...
As a workaround the new url_file
could be used as this does not seem to validate any scheme.
Looks like this is an issue when using amtool
with --alertmanager.url
. It does not affect amtool when using --config.file.
I think the issue is as @czenker explained.
For example, the command amtool config routes show
fails on Line 65 of cli/routing.go
. This command seems to use the /api/v2/status
API which returns the Alertmanager configuration but with the URL field for each webhook_configs
as <secret>
. However, if I revert the URL field for WebhookConfig
from *SecretURL
back to *URL
then it works as the URL is now returned in the /api/v2/status
API:
./amtool config routes show --alertmanager.url=http://127.0.0.1:9093
Routing tree:
.
└── default-route receiver: example
That said, I don't think we should revert the change from *URL
to *SecretURL
. Instead, I think the actual bug is in UnmarshalYAML
for WebhookConfig
.
The reason for this is that the UnmarshalYAML
function for *SecretURL
has special code to detect <secret>
and return nil
, but UnmarshalYAML
for WebhookConfig
ignores this when instead it should be a special case.
However, looking further into how UnmarshalYAML
works for SecretURL
, I don't think this check is even needed as there is an existing check in parseURL
for the scheme. Perhaps we can delete the duplicate check from UnmarshalYaml
then?
diff --git a/config/notifiers.go b/config/notifiers.go
index db86b1a2..2650db5f 100644
--- a/config/notifiers.go
+++ b/config/notifiers.go
@@ -503,11 +503,6 @@ func (c *WebhookConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
if c.URL != nil && c.URLFile != "" {
return fmt.Errorf("at most one of url & url_file must be configured")
}
- if c.URL != nil {
- if c.URL.Scheme != "https" && c.URL.Scheme != "http" {
- return fmt.Errorf("scheme required for webhook url")
- }
- }
return nil
}
Ah, I get it now -- I think it's because SecretURL
is actually an URL
https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L127
and the UnmarshalYAML
of SecretURL
uses the original unmarshalling function
https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L138-L151
which means, SecretURL
actually does:
https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L91-L102
and ends up validating the URL on unmarshalling anyway.
I think the patch above makes sense and we should be safe to remove that check from UnmarshalYAML
of WebhookConfig
. The hard part is probably testing the CLI to cover both scenarios of with a config and with a fake remote Alertmanager.
@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.
@gotjosh Thanks for the quick fix. This is a quick reminder that v0.26.1 is not released yet.
Is 0.26.1 planned?
@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.
Any ETA on this?
What's missing to get this released? Any way we can support?
@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.
Can you be more specific about what is meant by tomorrow?
I am also having this error in grafana when checking alert manager 0.26 contact points. When can we expect this to be fixed?
error: scheme required for webhook url
@SuperQ please take a look, seems something prevents for a bugfix release .1
I have also just hit this issue when upgrading alertmanager, and it has broken our automated test suite
Should be fixed now?