alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Regression in 0.26.0 when using `amtool config rules` with `webhook_configs`

Open czenker opened this issue 1 year ago • 13 comments

What did you do?

Update from version 0.25.0 to 0.26.0.

What did you expect to see?

We use amtool config routes test alertname=Foobar severity=panic in an automated test to ensure there is no misconfiguration that would not route alerts into on-call. We expected the command to work as before in the sense, that it would tell us which receivers would receive that alert.

What did you see instead? Under which circumstances?

With amtool 0.26.0:

error: scheme required for webhook url

With amtool 0.25.0 (error message is different, but the problem is the same):

Warning: amtool version (0.25.0) and alertmanager version (0.26.0) are different.
amtool: error: unsupported scheme "" for URL

amtool exits with status code 1 regardless and does not print the matched routes.

This is very likely a regression from https://github.com/prometheus/alertmanager/pull/3228 as the API does not output the url anymore, but amtool seems to be unaware of it and detects an error.

It also affects the amtool config routes show command.

Alertmanager configuration file

This probably only happens when using a webhook_configs with a url:

global: { } # ...
route: { } # ...
receivers:
- name: oncall-alerts
  webhook_configs:
  - url: http://example.com/webhook
templates: [] # ...

As a workaround the new url_file could be used as this does not seem to validate any scheme.

czenker avatar Sep 04 '23 13:09 czenker

Looks like this is an issue when using amtool with --alertmanager.url. It does not affect amtool when using --config.file.

grobinson-grafana avatar Sep 04 '23 18:09 grobinson-grafana

I think the issue is as @czenker explained.

For example, the command amtool config routes show fails on Line 65 of cli/routing.go. This command seems to use the /api/v2/status API which returns the Alertmanager configuration but with the URL field for each webhook_configs as <secret>. However, if I revert the URL field for WebhookConfig from *SecretURL back to *URL then it works as the URL is now returned in the /api/v2/status API:

./amtool config routes show --alertmanager.url=http://127.0.0.1:9093
Routing tree:
.
└── default-route  receiver: example

That said, I don't think we should revert the change from *URL to *SecretURL. Instead, I think the actual bug is in UnmarshalYAML for WebhookConfig.

The reason for this is that the UnmarshalYAML function for *SecretURL has special code to detect <secret> and return nil, but UnmarshalYAML for WebhookConfig ignores this when instead it should be a special case.

However, looking further into how UnmarshalYAML works for SecretURL, I don't think this check is even needed as there is an existing check in parseURL for the scheme. Perhaps we can delete the duplicate check from UnmarshalYaml then?

diff --git a/config/notifiers.go b/config/notifiers.go
index db86b1a2..2650db5f 100644
--- a/config/notifiers.go
+++ b/config/notifiers.go
@@ -503,11 +503,6 @@ func (c *WebhookConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
        if c.URL != nil && c.URLFile != "" {
                return fmt.Errorf("at most one of url & url_file must be configured")
        }
-       if c.URL != nil {
-               if c.URL.Scheme != "https" && c.URL.Scheme != "http" {
-                       return fmt.Errorf("scheme required for webhook url")
-               }
-       }
        return nil
 }

grobinson-grafana avatar Sep 04 '23 19:09 grobinson-grafana

Ah, I get it now -- I think it's because SecretURL is actually an URL

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L127

and the UnmarshalYAML of SecretURL uses the original unmarshalling function

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L138-L151

which means, SecretURL actually does:

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L91-L102

and ends up validating the URL on unmarshalling anyway.

I think the patch above makes sense and we should be safe to remove that check from UnmarshalYAML of WebhookConfig. The hard part is probably testing the CLI to cover both scenarios of with a config and with a fake remote Alertmanager.

gotjosh avatar Sep 05 '23 12:09 gotjosh

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

gotjosh avatar Sep 05 '23 16:09 gotjosh

@gotjosh Thanks for the quick fix. This is a quick reminder that v0.26.1 is not released yet.

czenker avatar Sep 13 '23 08:09 czenker

Is 0.26.1 planned?

rajatvig avatar Oct 26 '23 22:10 rajatvig

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

Any ETA on this?

xbglowx avatar Nov 03 '23 19:11 xbglowx

What's missing to get this released? Any way we can support?

KevinGimbel avatar Nov 09 '23 10:11 KevinGimbel

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

Can you be more specific about what is meant by tomorrow?

k0ste avatar Nov 18 '23 12:11 k0ste

I am also having this error in grafana when checking alert manager 0.26 contact points. When can we expect this to be fixed?

error: scheme required for webhook url

bmgante avatar Nov 26 '23 00:11 bmgante

@SuperQ please take a look, seems something prevents for a bugfix release .1

k0ste avatar Dec 13 '23 13:12 k0ste

I have also just hit this issue when upgrading alertmanager, and it has broken our automated test suite

timmow avatar Jan 23 '24 14:01 timmow

Should be fixed now?

TheMeier avatar Mar 10 '24 12:03 TheMeier