alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

opsgenie_config using api_key_file not working

Open zoezhangmattr opened this issue 11 months ago • 13 comments

What did you do? using vault injector to inject the api key - has issue the /vault/secrets/opsgenie_api_key , the content is the apikey file owner is nobody, same as alertmanager user/group,. its mode is 644 or 777, tried both

same alert can be routed to slack, but cant be opsgenie

using plain text api key value - works What did you expect to see? thought it should work, but so far no luck, need some guidance pls What did you see instead? Under which circumstances? ts=2024-03-13T02:42:59.388Z caller=notify.go:848 level=warn component=dispatcher receiver=opsgenie integration=opsgenie[0] aggrGroup="{}/{severity=~"^(?:critical|error)$"}:{}" msg="Notify attempt failed, will retry later" attempts=1 err="Post "https://api.opsgenie.com/v2/alerts": net/http: invalid header field value for "Authorization"" Environment

  • System information:

    insert output of uname -srm here

  • Alertmanager version:

    insert output of alertmanager --version here (repeat for each alertmanager version in your cluster, if relevant to the issue) 0.26.0 and 0.27.0

  • Prometheus version:

    insert output of prometheus --version here (repeat for each prometheus version in your cluster, if relevant to the issue) 2.47.0

  • Alertmanager configuration file:

global: {}
receivers:
- name: opsgenie
  opsgenie_configs:
  - api_key_file: /vault/secrets/opsgenie_api_key
    message: "{{ range .Alerts }} \n{{ .Annotations.summary }}\n{{ end }}"
    priority: '{{ if .CommonAnnotations.priority }}{{ .CommonAnnotations.priority
      }}{{ else }}P3{{ end }}'
    responders:
    - name: devops
      type: team
route:
  group_interval: 5m
  group_wait: 10s
  receiver: alerts-slack
  repeat_interval: 3h
  routes:
  - continue: true
    match_re:
      severity: critical|error
    receiver: opsgenie
  • Logs:
ts=2024-03-13T02:42:59.388Z caller=notify.go:848 level=warn component=dispatcher receiver=opsgenie integration=opsgenie[0] aggrGroup="{}/{severity=~\"^(?:critical|error)$\"}:{}" msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://api.opsgenie.com/v2/alerts\": net/http: invalid header field value for \"Authorization\""

zoezhangmattr avatar Mar 13 '24 02:03 zoezhangmattr

This looks like something is wrong with the api key and not alertmanager. Have you verified, eg in a test pod, that /vault/secrets/opsgenie_api_key really contains the correct key?

TheMeier avatar Mar 20 '24 06:03 TheMeier

This looks like something is wrong with the api key and not alertmanager. Have you verified, eg in a test pod, that /vault/secrets/opsgenie_api_key really contains the correct key?

thanks for reply, yes, the file has the correct key id, funny thing is using the same way to do opsgenie heatbeat, using same key, works for deadman switch

- name: prometheus-deadman-switch
  webhook_configs:
  - url: https://api.opsgenie.com/v2/heartbeats/xxxxxx/ping
    send_resolved: false
    http_config:
      basic_auth:
        username: ':'
        password_file: /vault/secrets/opsgenie_api_key

zoezhangmattr avatar Mar 22 '24 01:03 zoezhangmattr

One is an opsgenie_configs the other is a http_config. You are using /vault/secrets/opsgenie_api_key as a password in the latter indicating to me that it contains a paassword and not an API key.

TheMeier avatar Mar 22 '24 10:03 TheMeier

@zoezhangmattr any feedback?

TheMeier avatar Mar 26 '24 11:03 TheMeier

One is an opsgenie_configs the other is a http_config. You are using /vault/secrets/opsgenie_api_key as a password in the latter indicating to me that it contains a paassword and not an API key.

no, as i mentioned before, the api key is working if using k8s secret , same api key, the password is correct, it should be opsgenie api key in this case

zoezhangmattr avatar Apr 04 '24 01:04 zoezhangmattr

Hi @zoezhangmattr! Does the file exist and contain the secret at the time the Alertmanager is started? It sounds like there might be a race condition between the Alertmanager starting and vault-injector writing the file.

grobinson-grafana avatar Apr 17 '24 09:04 grobinson-grafana

I'm also running into this. If I go ahead and put the api key directly as a string into opsgenie_config/api_key of the receiver, it works.

When using opsgenie_config/api_key_file and a secret that's correctly mounted, it breaks with the exact same API key and Alertmanager logs invalid header field value for \"Authorization\".

jdegendt avatar Jun 24 '24 11:06 jdegendt

I'm also running into this. If I go ahead and put the api key directly as a string into opsgenie_config/api_key of the receiver, it works.

When using opsgenie_config/api_key_file and a secret that's correctly mounted, it breaks with the exact same API key and Alertmanager logs invalid header field value for \"Authorization\".

Can you check this?

Does the file exist and contain the secret at the time the Alertmanager is started? It sounds like there might be a race condition between the Alertmanager starting and vault-injector writing the file.

grobinson-grafana avatar Jun 24 '24 12:06 grobinson-grafana

@grobinson-grafana, perhaps to add: I'm not using Vault to inject the file at hand. I'm deploying using Helm and there's no init containers involved (aside from config-reloader).

So given that the secret is deployed beforehand and I'm not injecting using Vault, I'm assuming the file is present before Alertmanager starts given standard Kubernetes pod lifecycle mgmt, right?

Let me see if I can figure out how to add a short magic sleep before the Alertmanager process starts, in the meantime Alertmanager values for reference:

...
alertmanager:
  enabled: true

  alertmanagerSpec:
    image:
      registry: quay.io
      repository: prometheus/alertmanager
      tag: v0.27.0
      sha: ""

    secrets:
    - opsgenie-api-key

  config:
    global:
      resolve_timeout: 5m

    route:
      group_by: ['namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'

      routes:
      - receiver: 'null'
        matchers:
          - job !~ "fdbmeter.*"
      - receiver: 'opsgenie'
        matchers:
          - job =~ "fdbmeter.*"

    receivers:
    - name: 'null'
    - name: 'opsgenie'
      opsgenie_configs:
        - tags: 'integrities,foundationdb'
          api_key_file: /etc/alertmanager/secrets/opsgenie-api-key/opsgenie
...

jdegendt avatar Jun 24 '24 12:06 jdegendt

Went ahead and modified the statefulset as such:

      containers:
      - command: [
        "/bin/sh", "-c"
        ]
        args:
        - cat "/etc/alertmanager/secrets/opsgenie-api-key/opsgenie";
          /bin/alertmanager --config.file=/etc/alertmanager/config_out/alertmanager.env.yaml ...;

And it outputs my API key just fine, which makes me doubt there's a race condition at play here, anything else I can test here?

jdegendt avatar Jun 24 '24 13:06 jdegendt

I ended up adding some additional logging to the Opsgenie notifier to print the headers before alerting and lo and behold, there's a newline attached to my API key:

ts=2024-06-24T14:32:51.702Z caller=opsgenie.go:296 level=info integration=opsgenie SETAUTHHEADERTO:="GenieKey redacted-api-key-foo-bar\n"

So I'll have a look at how I'm templating my secret file.

Edit: Also correct me if I'm wrong here but from looking at the code, I doubt this'll ever be a race condition since the API key is read from the file each time a HTTP request to OpsGenie is being built, and seemingly not being persisted in the notifier config struct. See this routine here.

jdegendt avatar Jun 25 '24 06:06 jdegendt

Im encountering the same issue with this configuration, im not using secrets in anyway, im setting the api key as plain text.

alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 3h
      receiver: opsgenie
      routes:
        - match: {}
          receiver: opsgenie
    receivers:
      - name: opsgenie
        opsgenie_configs:
          - api_key: <plain-api-key>
            responders:
              - name: <team-name>
                type: team

This is the error message i see in the logs:

ts=2024-07-10T09:23:26.444Z caller=notify.go:745 level=warn component=dispatcher receiver=kube-prometheus-stack/alertmanager-config-management/opsgenie integration=opsgenie[0] aggrGroup="{}/{}:{alertname=\"KubeVersionMismatch\", prometheus=\"kube-prometheus-stack/kube-prometheus-stack-prometheus\", severity=\"warning\"}" msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://api.opsgenie.com/v2/alerts\": net/http: invalid header field value for \"Authorization\"" 

Any help??

@zoezhangmattr did you manage to resolve this?

fralvarop avatar Jul 10 '24 09:07 fralvarop

We had this same issue today, and the cause was that our api key secret ended in a newline character before it was base64 encoded.

armsnyder avatar Sep 04 '24 17:09 armsnyder