oncall icon indicating copy to clipboard operation
oncall copied to clipboard

[OSS] Escalation mobile push fail with "HTTP client error 403" while test notif works

Open bmalynovytch opened this issue 1 year ago • 13 comments

What went wrong?

What happened:

  • Created an alert group
  • Escalation chain triggered a mobile push notification
  • Timeline shows "failed to notify ... by mobile push important" oncall-mobile-push-failed

What did you expect to happen:

  • Created an alert group
  • Escalation chain triggered a mobile push notification
  • Get notified

How do we reproduce it?

  1. Configure Grafana and OnCall OSS
  2. Plug to Grafana Cloud on EU region
  3. Create alert group and escalation chain to trigger mobile push notification
  4. 💥

Grafana OnCall Version

v1.3.47 (Docker)

Product Area

Alert Flow & Configuration, Mobile App

Grafana OnCall Platform?

Kubernetes

User's Browser?

No response

Anything else to add?

Test notifications are working properly from the user's profile, but logs are seen in "engine" while failing notifications fail in "celery". Their might be something wrong with GRAFANA_CLOUD_ONCALL_API_URL not being properly set/used in celery, which makes it auth on the default Grafana Cloud platform instead of https://oncall-prod-eu-west-0.grafana.net/oncall as it should.

bmalynovytch avatar Oct 27 '23 11:10 bmalynovytch

I also had troubles to configure GRAFANA_CLOUD_ONCALL_API_URL using env variables and needed to enable FEATURE_LIVE_SETTINGS_ENABLED. (See #1479)

bmalynovytch avatar Oct 27 '23 11:10 bmalynovytch

Hi @bmalynovytch, thank you for opening an issue! I'm trying to reproduce this now, could you please tell more about how you've passed the GRAFANA_CLOUD_ONCALL_API_URL env variable into your deployment?

Their might be something wrong with GRAFANA_CLOUD_ONCALL_API_URL not being properly set/used in celery, which makes it auth on the default Grafana Cloud platform instead of https://oncall-prod-eu-west-0.grafana.net/oncall as it should.

Have you passed this env variable both to release-oncall-engine and release-oncall-celery k8s deployments?

vadimkerr avatar Nov 06 '23 14:11 vadimkerr

Hi @vadimkerr ~I deployed using the helm chart, which requires to provide the env variable only once. It's then pushed to both deployments and I can confirm that the env is available in the shell of Celery.~

bmalynovytch avatar Nov 06 '23 14:11 bmalynovytch

Sorry @vadimkerr I was wrong, the setting is provided using LiveSetting, not env.

bmalynovytch avatar Nov 06 '23 15:11 bmalynovytch

So ... I tried providing the token twice, using LiveSettings AND env variable. To do so, I needed to reset the token (hadn't kept a copy of the previous). Now, notifications work properly 🤷

I tried removing the env variable, I get the 403 error again. The magic is the env variable which makes things work.

There seem to be a big mess with LiveSettings and env variables 😞

bmalynovytch avatar Nov 06 '23 15:11 bmalynovytch

Glad it's now working for you @bmalynovytch! I'll try to reproduce this and see if there's something we can do about it.

vadimkerr avatar Nov 07 '23 11:11 vadimkerr

@bmalynovytch can you please share what exactly and how have you configured GRAFANA_CLOUD_ONCALL_API_URL ? i have oncall in k8s cluster and with all what i've tried I still have token is invalid

TomasHradecky avatar Dec 07 '23 15:12 TomasHradecky

@bmalynovytch can you please share what exactly and how have you configured GRAFANA_CLOUD_ONCALL_API_URL ? i have oncall in k8s cluster and with all what i've tried I still have token is invalid

The trick is that you need to provide token twice, one with env variables and another one with an override in LiveSettings. There seem to be some code using the override properly while not using the env variable, and another portion using only the env variable.

In the helm values, here's the relevant section :

env:
  GRAFANA_CLOUD_ONCALL_API_URL: https://oncall-prod-eu-west-0.grafana.net/oncall
  GRAFANA_CLOUD_ONCALL_TOKEN: ....

bmalynovytch avatar Dec 08 '23 07:12 bmalynovytch

yesterday i've tried to add env on grafana instead of oncall on woalla second clikc on connect oncall to cloud go through. So if anyone has the same issue as me try to add this for Grafana

env:
  ONCALL_CLOUD_API_URL: "https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations"
 

@bmalynovytch thanks for info, I will try it to find what is the difference between that

TomasHradecky avatar Dec 08 '23 15:12 TomasHradecky

@TomasHradecky is this how you solved it? This is currently not working for us, we still get 404 (403 if *_CLOUD_ONCALL_API_URL is unset) if we try to set the token via grafana admin > plugins UI

services:
  grafana:
    environment:
      ONCALL_CLOUD_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations' # we also tried without /api/v1/integrations here
  
  on-call-engine:
    environment:
      FEATURE_LIVE_SETTINGS_ENABLED: 'true' # should be default anyway
      GRAFANA_API_URL: 'http://grafana:3000' # local network url
      GRAFANA_CLOUD_ONCALL_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall'
      GRAFANA_CLOUD_ONCALL_TOKEN: '<our-cloud-token>'
      GRAFANA_CLOUD_NOTIFICATIONS_ENABLED: 'true'
     

Patrick-Remy avatar Jan 08 '24 16:01 Patrick-Remy

@TomasHradecky is this how you solved it? This is currently not working for us, we still get 404 (403 if *_CLOUD_ONCALL_API_URL is unset) if we try to set the token via grafana admin > plugins UI

services:
  grafana:
    environment:
      ONCALL_CLOUD_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall/api/v1/integrations' # we also tried without /api/v1/integrations here
  
  on-call-engine:
    environment:
      FEATURE_LIVE_SETTINGS_ENABLED: 'true' # should be default anyway
      GRAFANA_API_URL: 'http://grafana:3000' # local network url
      GRAFANA_CLOUD_ONCALL_API_URL: 'https://oncall-prod-eu-west-0.grafana.net/oncall'
      GRAFANA_CLOUD_ONCALL_TOKEN: '<our-cloud-token>'
      GRAFANA_CLOUD_NOTIFICATIONS_ENABLED: 'true'
     

@Patrick-Remy is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL. Just for sure, check on grafana cloud portal in oncall settings that oncall-prod-eu-west-0 is right domain for your oncall.

TomasHradecky avatar Jan 09 '24 09:01 TomasHradecky

is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL.

This was the issue, in compose-setup, only the public accessible endpoint worked! Afterwards it was required to save the token twice in the settings. The first time it resulted in 403 (?!), and just because of frustration we pressed the button again and voilà everything was connected.

This is so extremely weird and buggy, thanks a lot for your help!

Patrick-Remy avatar Jan 09 '24 21:01 Patrick-Remy

is your grafana deployed as a part of oncall helm ? If not try to use url you use from outside of cluster to access Grafana for GRAFANA_API_URL.

This was the issue, in compose-setup, only the public accessible endpoint worked! Afterwards it was required to save the token twice in the settings. The first time it resulted in 403 (?!), and just because of frustration we pressed the button again and voilà everything was connected.

This is so extremely weird and buggy, thanks a lot for your help!

happy to help, completely same behavior, just frustration helped to make it work. Really do not understand why grafana is accessible only through public domain from oncall. Even if kube-dns records were OK on my cluster and oncall know the path to grafana, only public endpoint is working solution.

TomasHradecky avatar Jan 09 '24 21:01 TomasHradecky