oncall Auth issues installing Oncall plugin

What went wrong?

What happened:

I've configured the plugin to connect to the engine. Everything appears to be ok and at least the UI indicates that the plugin was installed. Though there are a couple entries like:

source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance Expecting value: line 1 column 1 (char 0)

in the engine logs.

Now after Open Grafana OnCall is clicked the page shows OnCall was not able to load the current user. Try refreshing the page. Call to api/plugin-proxy/grafana-oncall-app/api/internal/v1/ fail with "Non-existent or anonymous user." The engine logs read:

source=engine:app google_trace_id=none logger=apps.auth_token.auth Could not get user from grafana request. Context {'UserID': 1, 'OrgID': 1, 'OrgName': 'Main Org.', 'OrgRole': 'Admin', 'ExternalAuthModule': '', 'ExternalAuthID': '', 'Login': 'admin', 'Name': '', 'Email': 'admin@localhost', 'ApiKeyID': 0, 'IsServiceAccount': False, 'OrgCount': 1, 'IsGrafanaAdmin': True, 'IsAnonymous': False, 'IsDisabled': False, 'HelpFlags1': 0, 'LastSeenAt': '2023-09-15T11:00:49Z', 'Teams': [], 'Analytics': {'Identifier': 'admin@localhost@https://grafana.develop.argyle.io/', 'IntercomIdentifier': ''}}
source=engine:app google_trace_id=none logger=root inbound latency=0.033159 status=401 method=OPTIONS path=/api/internal/v1/escalation_policies user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Safari/605.1.15 content-length=0 slow=0 
source=engine:app google_trace_id=none logger=django.request Unauthorized: /api/internal/v1/escalation_policies
source=engine:uwsgi status=401 method=OPTIONS path=/api/internal/v1/escalation_policies latency=0.034497 google_trace_id=- protocol=HTTP/1.1 resp_size=284 req_body_size=0

What did you expect to happen:

Plugin is installed and actually opens

I've tried this both with the default admin account and Google managed account that has been given admin permissions with the same results.

How do we reproduce it?

I installed the helm chart using the following value config:

grafana:
  enabled: false

externalGrafana:
  url: http://k8s-grafana

ingress-nginx:
  enabled: false

ingress:
  enabled: false

cert-manager:
  enabled: false

mariadb:
  enabled: false

postgresql:
  enabled: true

database:
  type: postgresql

The same cluster is running Grafana version 10.0.1 (5a30620b85)

Grafana OnCall Version

v1.3.37

Product Area

Auth

Grafana OnCall Platform?

Kubernetes

User's Browser?

No response

Anything else to add?

No response

Sep 15 '23 11:09 Homulvas

After further investigation it seems like on install the Oncall plugin fails to read org and user data and then all further calls fail because of this as it has no userdata to use. This just a theory I guess but it still doesn't explain why the grafana api returns empty response on those api/org, api/org/users/, and api/teams/search calls.

Sep 18 '23 14:09 Homulvas

Do we have any fix for this ?

Oct 08 '23 12:10 abbaspw21

Same issue, using Grafana 9.5.3 with oncall plugin+engine version 1.3.43. Seeing a mix of 401 and 403 in the engine logs.

2023-10-12 12:23:57 source=engine:app google_trace_id=none logger=django.request Forbidden: /api/internal/v1/alert_receive_channels/integration_options

2023-10-12 12:23:57 source=engine:uwsgi status=403 method=GET path=/api/internal/v1/alert_receive_channels/integration_options latency=0.012331 google_trace_id=- protocol=HTTP/1.1 resp_size=249 req_body_size=0

2023-10-12 12:23:57 source=engine:app google_trace_id=none logger=apps.auth_token.auth Could not get user from grafana request. Context {'UserID': 1, 'OrgID': 1, 'OrgName': 'Main Org.', 'OrgRole': 'Admin', 'ExternalAuthModule': '', 'ExternalAuthID': '', 'Login': 'admin', 'Name': 'Admin', 'Email': 'admin@localhost', 'ApiKeyID': 0, 'IsServiceAccount': False, 'OrgCount': 1, 'IsGrafanaAdmin': True, 'IsAnonymous': False, 'IsDisabled': False, 'HelpFlags1': 0, 'LastSeenAt': '2023-10-12T12:23:11Z', 'Teams': [], 'Analytics': {'Identifier': 'admin@localhost@http://localhost/grafana/', 'IntercomIdentifier': ''}}

Oct 12 '23 12:10 Polinth

i had exactly same issue, but this happened only in my develop env. i'd assume if you check celery logs in real time you might see this

Max retries exceeded with url: /api/org (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')))

and other api(s) from grafana, this happened since i use self signed certificates for develop env, so celery failed, i did not found any way to skip ssl verification by manipulating the helm chart, so i just pointed to the http (80) of grafana locally inside the k8s cluster, all started to work.

Oct 17 '23 10:10 ArieLevs

i had exactly same issue, but this happened only in my develop env. i'd assume if you check celery logs in real time you might see this
Max retries exceeded with url: /api/org (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')))
and other api(s) from grafana, this happened since i use self signed certificates for develop env, so celery failed, i did not found any way to skip ssl verification by manipulating the helm chart, so i just pointed to the http (80) of grafana locally inside the k8s cluster, all started to work.

Thanks, you pointed me in the right direction! It wasn't SSL error in my case but when I dug into the celery logs I found dns errors, which led me to a FQDN misconfig (my fault) of the GRAFANA_API_URL. After correcting that and reconnecting the plugin the auth issue is gone.

So I guess there can be a number of different root causes for this, I was just not looking in the right place for clues (or understanding how all pieces fit together) :)

Oct 17 '23 11:10 Polinth

What GRAFANA_API_URL did you use?

Dec 14 '23 22:12 muratkuru74

i had exactly same issue, but this happened only in my develop env. i'd assume if you check celery logs in real time you might see this
Max retries exceeded with url: /api/org (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')))
and other api(s) from grafana, this happened since i use self signed certificates for develop env, so celery failed, i did not found any way to skip ssl verification by manipulating the helm chart, so i just pointed to the http (80) of grafana locally inside the k8s cluster, all started to work.
Thanks, you pointed me in the right direction! It wasn't SSL error in my case but when I dug into the celery logs I found dns errors, which led me to a FQDN misconfig (my fault) of the GRAFANA_API_URL. After correcting that and reconnecting the plugin the auth issue is gone.

So I guess there can be a number of different root causes for this, I was just not looking in the right place for clues (or understanding how all pieces fit together) :)

@Polinth I think I have the same issues but how did you configure this? Did you just change the GRAFANA_API_URL in the .env or sth else? For the first case it sadly still doesn't work even though when I do a nslookup in the containers the name does indeed resolve. My filtered logs:

celery_1 | 2024-02-01 19:12:31,372 source=engine:celery worker=ForkPoolWorker-2 task_id=cb61aaf3-6039-427e-a563-a86e7fbf4fb9 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.helpers.client level=WARNING Error connecting to api instance HTTPConnectionPool(host='grafana', port=3000): Max retries exceeded with url: /api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc0b09a25d0>: Failed to establish a new connection: [Errno -2] Name does not resolve')) celery_1 | 2024-02-01 19:12:31,374 source=engine:celery worker=ForkPoolWorker-2 task_id=cb61aaf3-6039-427e-a563-a86e7fbf4fb9 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.helpers.client level=WARNING Error connecting to api instance HTTPConnectionPool(host='grafana', port=3000): Max retries exceeded with url: /api/org (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc0b09ba210>: Failed to establish a new connection: [Errno -2] Name does not resolve')) engine_1 | 2024-02-01 19:12:31 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance HTTPConnectionPool(host='grafana', port=3000): Max retries exceeded with url: /api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdecf0835d0>: Failed to establish a new connection: [Errno -2] Name does not resolve')) engine_1 | 2024-02-01 19:12:31 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance HTTPConnectionPool(host='grafana', port=3000): Max retries exceeded with url: /api/org (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdecf083ed0>: Failed to establish a new connection: [Errno -2] Name does not resolve'))

Feb 01 '24 19:02 MarkOWiesemann

I think I have the same issues but how did you configure this? Did you just change the GRAFANA_API_URL in the .env or sth else? For the first case it sadly still doesn't work even though when I do a nslookup in the containers the name does indeed resolve. My filtered logs:

I was using the helm charts as a base for my deployment, and there were multiple instances in those definitions that looked like this:

- name: GRAFANA_API_URL
  value: "http://grafana.grafana-test.svc"

with format being http://<service>.<namespace>.svc

That does indeed then get populated as env within the deployments, most notably the celery one where I found the errors. After I properly edited all of those entries and re-applied the yaml it started working for me.

I didn't move forward with any of this setup after my testing, so it's been a while since I touched it, but I hope I remember correctly.

Feb 01 '24 21:02 Polinth

closed in favour of https://github.com/grafana/oncall/issues/3772

Mar 20 '24 12:03 iskhakov

oncall oncall copied to clipboard

Auth issues installing Oncall plugin

What went wrong?

How do we reproduce it?

Grafana OnCall Version

Product Area

Grafana OnCall Platform?

User's Browser?

Anything else to add?

oncall
oncall copied to clipboard