helm-charts
helm-charts copied to clipboard
Grafana pod crashes after upgrade or restart
grafana chart version: 6.29.6 k8s vesrion: 1.21.4
Grafana pod crashes after upgrade or restart and never become online. Everything is fine with fresh install. But if I trigger restart or try to upgrade helm release - getting this error:
│ grafana panic: New alert rules created while using unified alerting will be deleted, set force_migration=true in your grafana.ini and try again if this is okay. │
│ grafana │
│ grafana goroutine 1 [running]: │
│ grafana github.com/grafana/grafana/pkg/services/sqlstore/migrations/ualert.AddDashAlertMigration(0xc000c974f0) │
│ grafana /drone/src/pkg/services/sqlstore/migrations/ualert/ualert.go:78 +0x797 │
│ grafana github.com/grafana/grafana/pkg/services/sqlstore/migrations.(*OSSMigrations).AddMigration(0xc00028a620, 0xc000c974f0) │
│ grafana /drone/src/pkg/services/sqlstore/migrations/migrations.go:58 +0x205 │
│ grafana github.com/grafana/grafana/pkg/services/sqlstore.(*SQLStore).Migrate(0xc000232300, 0x0) │
│ grafana /drone/src/pkg/services/sqlstore/sqlstore.go:135 +0x6f │
│ grafana github.com/grafana/grafana/pkg/services/sqlstore.ProvideService(0xc000d00000, 0x18, {0x36e67e0, 0x5752738}, {0x370c300, 0xc000c970e0}) │
│ grafana /drone/src/pkg/services/sqlstore/sqlstore.go:67 +0xdc │
│ grafana github.com/grafana/grafana/pkg/server.Initialize({{0x7ffc83eb2c4d, 0x18}, {0x7ffc83eb2c31, 0x12}, {0xc0001a8040, 0x5, 0x5}}, {{0x0, 0x0}, {0x0, ...}, ...}, ...) │
│ grafana /drone/src/pkg/server/wire_gen.go:147 +0x1b6 │
│ grafana github.com/grafana/grafana/pkg/cmd/grafana-server/commands.executeServer({0x7ffc83eb2c4d, 0x18}, {0x7ffc83eb2c31, 0x12}, {0x0, 0x0}, {0x7ffc83eb2c72, 0x6}, 0x0, {{0x3667e20, ...}, ...}) │
│ grafana /drone/src/pkg/cmd/grafana-server/commands/cli.go:170 +0x625 │
│ grafana github.com/grafana/grafana/pkg/cmd/grafana-server/commands.RunServer({{0x3667e20, 0x5}, {0x3669770, 0xa}, {0x3667e18, 0x4}, {0x3669760, 0xa}}) │
│ grafana /drone/src/pkg/cmd/grafana-server/commands/cli.go:107 +0x785 │
│ grafana main.main() │
│ grafana /drone/src/pkg/cmd/grafana-server/main.go:16 +0xc5
I have noticed behavior described above in all versions since 6.29.3
What is very strange - if you update sequentially from version 6.29.2 - everything is fine.
Version 6.29.2 works fine, except as described below, restarting the pod is fine.
And one more strange thing - version 6.29.2, installation from scratch. There are Prometheus alerts:

But after I reboot the pod, the prometheus alerts disappear:

Chart values:
enabled: true
plugins: []
grafana.ini:
server:
domain: ""
root_url: "%(protocol)s://%(domain)s/grafana"
serve_from_sub_path: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: qqq
orgId: 1
type: file
options:
path: /var/lib/grafana/dashboards/qqq
dashboardsConfigMaps:
qqq: "qqq-dashboards"
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: prometheus
type: prometheus
url: "http://{{ .Release.Name }}-prometheus-server"
- name: loki
type: loki
url: "http://{{ .Release.Name }}-loki:3100"
jsonData:
manageAlerts: false
Same issue here!
Got the same problem on 8.5.4
Same here
Downgrading to 8.2.6 seems to have fixed it by resetting alerting back in legacy mode.
I'm not using this chart, so apologies if this is a bit off-topic, but perhaps this helps someone in similar situation. I hit the same issue with grafana panicking:
grafana panic: New alert rules created while using unified alerting will be deleted, set force_migration=true in your grafana.ini and try again if this is okay.
That specific environment was just running grafana/grafana:latest with some automatic image pull and update in place. Looking at current image, it contains Grafana v8.5.5. However looking at a list of previously used docker images, I've noticed that at some stage about 7 days ago grafana/grafana:latest (image id 21d6214505a0) contained Grafana v9.0.0-beta2:
> docker run -ti --rm --entrypoint /usr/share/grafana/bin/grafana-server 21d6214505a0 -v
Version 9.0.0-beta2 (commit: 3ed722bb5c, branch: HEAD)
So to me it looks like at some stage by blindly upgrading the latest image, Grafana was updated to latest beta (which BTW on a plus side worked without issues) and then today downgraded to v8.5.5 leading to the panic. To resolve this I restored grafana DB from before this unintended upgrade and I can run the latest image again without problems.
As a side note, I'm aware running latest is generally not the best practice, in this environment I'm not super concerned about the availability or the data, just sharing this hoping it will help someone in similar situation.
Very interesting observation. However, I think this is not related to the issue I have seen here. I doubt, we were using latest tag. Version in this helm chart are always pinned. (e.g. prom-stack 0.56.3 uses grafana 8.5.3)
Yeah, I would assume so and I apologize for hijacking this issue, however this issue is pretty much the only remotely relevant thing I found when searching that specific error message.
Got same error in grafana 8.5.3 as well. In the error, it is mentioned that "set force_migration=true in your grafana.ini".
- Could you please let us know how to avoid this issue? i.e. how to identify alert rule which is having issue and how to fix it?
- As mentioned in Grafana documentation, If we set force_migration to true then Force migration will run migrations that might cause data loss. DOes setting force_migration to true removes alert rules which are having issue OR it will remove all alert rules?
Once grafana pod goes to this state, it is not recovereing. So it is a blocker. Could you please let us know any W/A? Will settting force_migration=true in your grafana.ini solves issue? If no, please sugest some other W/A. If yes, what data will be lost?
We are still facing this issue? Are there any solutions to this?
The issue seems related to switching from Unified Alerting back to Legacy Alerting. In the Grafana docs they say to have force_migration = True to revert back to Legacy Alerting, which will restore your alerts to what they were at the time the update took place
But there's an issue with helm charts you could run into next
Had same issue, I enabled force_migration using extraEnvVars: GF_DEFAULT_FORCE_MIGRATION and lost all alerts except two of them that came with a new dashboard I imported couple of days back. I removed the two legacy type alerts and enabled back unified alerting. All is working now.
My Grafana version is 8.5.15 and had got the same error. As said in the logs here:
lvl=eror msg="Critical error" reason="Grafana has already been migrated to Unified Alerting.\nAny alert rules created while using Unified Alerting will be deleted by rolling back.\n\nSet force_migration=true in your grafana.ini and restart Grafana to roll back and delete Unified Alerting configuration data.
Followed the same by updating chart with below and it worked ✅
grafana.ini:
force_migration: true