oncall icon indicating copy to clipboard operation
oncall copied to clipboard

Grafana API Token is recreated each time OnCall page is being opened.

Open lstama opened this issue 3 years ago • 10 comments

Hello, I have a problem which make user without admin privilege can't access Grafana OnCall.

When I (as an organization admin) open OnCall page for the first time (or when doing a reload), I always greeted by this error page. retry

And when I click the retry button, this happened. as-admin

Then I'm redirected to the normal OnCall page success

For now it looks fine, I still can access OnCall in the end, and can create alert and integration. Then my friend who isn't an admin want to view the page as an editor. I instruct him to do the same (click retry if the error page shows up). He did what I said, but instead of seeing the same view as picture two, he got this: as-non-admin

Turns out the Grafana API Token is always being recreated each time someone reload the page (already checked the DB value), Plugin page also has this error: plugin-error

What I already did:

  1. Restart Grafana
  2. Recreate Grafana OnCall one time invite token
  3. Using both Server admin and Organization admin to setup the plugin

Also, these are some relevant logs from Grafana:

2022-09-07T11:03:57.602351353Z stdout F logger=context traceID=00000000000000000000000000000000 t=2022-09-07T11:03:57.602136283Z level=error msg="invalid API key" error="invalid API key" traceID=00000000000000000000000000000000

Oncall Engine:

		2022-09-07 18:03:57	
2022-09-07T11:03:57.56885098Z stdout F 2022-09-07 11:03:57 source=engine:app google_trace_id=none logger=root inbound latency=0.012544 status=202 method=POST path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:57	
2022-09-07T11:03:57.569311472Z stdout F 2022-09-07 11:03:57 source=engine:uwsgi status=202 method=POST path=/api/internal/v1/plugin/sync latency=0.013501 google_trace_id=- protocol=HTTP/1.1 resp_size=278 req_body_size=0
2022-09-07 18:03:58	
2022-09-07T11:03:58.267741236Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.09500776696950197 status=200 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:58	
2022-09-07T11:03:58.450652135Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.16960133600514382 status=200 method=GET url=https://grafana.my.org/api/teams/search?perpage=1000000 slow=0 
2022-09-07 18:03:58	
2022-09-07T11:03:58.458742653Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root inbound latency=0.29701 status=204 method=POST path=/api/internal/v1/plugin/install content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:58	
2022-09-07T11:03:58.459229884Z stdout F 2022-09-07 11:03:58 source=engine:uwsgi status=204 method=POST path=/api/internal/v1/plugin/install latency=0.298131 google_trace_id=- protocol=HTTP/1.1 resp_size=168 req_body_size=0
2022-09-07 18:04:00	
2022-09-07T11:04:00.561275718Z stdout F 2022-09-07 11:04:00 source=engine:app google_trace_id=none logger=root inbound latency=0.007313 status=200 method=GET path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:04:00	
2022-09-07T11:04:00.561751676Z stdout F 2022-09-07 11:04:00 source=engine:uwsgi status=200 method=GET path=/api/internal/v1/plugin/sync latency=0.008264 google_trace_id=- protocol=HTTP/1.1 resp_size=264 req_body_size=0

Celery

2022-09-07 18:03:21	
2022-09-07T11:03:21.330679494Z stderr F 2022-09-07 11:03:21,330 source=engine:celery task_id=aa674da7-355e-4763-a159-f63922251ada task_name=apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async name=celery.app.trace level=INFO Task apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async[aa674da7-355e-4763-a159-f63922251ada] succeeded in 0.01293463003821671s: None
2022-09-07 18:03:57	
2022-09-07T11:03:57.569954095Z stderr F 2022-09-07 11:03:57,569 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] received
2022-09-07 18:03:57	
2022-09-07T11:03:57.571366083Z stderr F 2022-09-07 11:03:57,571 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Start sync Organization 1
2022-09-07 18:03:57	
2022-09-07T11:03:57.604990332Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.helpers.client level=WARNING Error connecting to api instance 401 Client Error: Unauthorized for url: https://grafana.my.org/api/org/users
2022-09-07 18:03:57	
2022-09-07T11:03:57.605336694Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=root level=INFO outbound latency=0.02740099304355681 status=401 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:57	
2022-09-07T11:03:57.607496313Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Finish sync Organization 1
2022-09-07 18:03:57	
2022-09-07T11:03:57.607639099Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=celery.app.trace level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] succeeded in 0.03653913899324834s: None

We're using existing Grafana as OnCall frontend and deploy Grafana OnCall using helm in this repository. The alerting and OnCall system itself work normally.

lstama avatar Sep 08 '22 03:09 lstama

I've faced the same problem. Looks just like https://github.com/grafana/oncall/issues/316. I've reinstalled oncall deployment and now it's performing fine

th30nlyw4y avatar Sep 08 '22 11:09 th30nlyw4y

I've faced the same problem. Looks just like #316. I've reinstalled oncall deployment and now it's performing fine

What do you mean by reinstalling oncall deployment? Is it the Engine and Celery part, or everything including MariaDB, Redis, and RabbitMQ (using a newly fresh DB)?

lstama avatar Sep 09 '22 08:09 lstama

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

th30nlyw4y avatar Sep 09 '22 13:09 th30nlyw4y

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

lstama avatar Sep 12 '22 05:09 lstama

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

Yep, that's quite inconvenient. Hope this behavior would be fixed soon

th30nlyw4y avatar Sep 12 '22 06:09 th30nlyw4y

Got the same issue with Grafana 9.2.6, Oncall 1.1.5 and Helm 1.0.12. After couple of days Oncall setup becomes useless. Removing OnCall API key from https://grafana/org/apikeys helps till the next page reload.

juris avatar Dec 05 '22 15:12 juris

I have the same issue as @juris

duclm2609 avatar Dec 16 '22 10:12 duclm2609

I have the same issue. Helm deployment, only postgresql is external. Grafana 9.3.2 ; OnCall 1.1.14 ; Ingress disabled (HAProxy) After some time plugin just lost their API key: In grafana logs I've found: logger=context t=2023-01-12T13:21:53.551966373Z level=error msg="invalid API key" error="invalid API key" traceID= logger=data-proxy-log userId=2 orgId=1 uname=user path=/api/plugin-proxy/grafana-oncall-app/api/internal/v1/alertgroups/stats/ remote_addr=ip referer="https://fqdn/a/grafana-oncall-app/?page=incidents&status=0&status=1" t=2023-01-12T11:47:13.60359748Z level=error msg="Proxy request failed" err="dial tcp ip:8080: connect: connection refused" Plugin configuration page say cannot communicate with oncall-engine but don't provide button to reset configuration. Sometimes just opening general page of on call starts api key exchange as people wrote above, sometimes only redeploy helps to me.

PCbIX avatar Jan 12 '23 14:01 PCbIX

I have the same issue as @juris

ifeneg avatar Feb 08 '23 12:02 ifeneg

Have the same issue as @PCbIX:

  • Helm deployment, only postgresql is external
  • Grafana v8.5.3 ; OnCall 1.2.15 ; Ingress enabled

Each time I leave Grafana Oncall page the Grafana API Token is supposed to be recreated, but it's not recreating and I'm loosing the connection to Grafana Oncall plugin with a message: 'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'

First time the workaround to reopen a general page of Grafana Oncall to start api key exchange went through: the notification of the API token creation popped up and I could access Grafana Oncall. The second time it didn't work and I'm stuck on 'Initializing plugin' step with a message: 'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'.

Milamary avatar Apr 26 '23 04:04 Milamary