chart
chart copied to clipboard
Helm chart unable to create fresh server: `relation "accounts" does not exist` unless db migration job is run manually
Steps to reproduce the problem
Creating a fresh, completely new mastodon server using the helm chart fails to initialize. It appears that the database tables required are not initialized as both the web and sidekiq containers return relation "accounts" does not exist
From my casual perusal, it appears that the creation of the tables should be done by job-db-migrate.yaml, but it never runs as the web container is never finished installing it needs the tables to be initialized, and is thus stuck in a crash loop.
Executing the template to a file, copying the job from there and manually kubectl
ing the db migration job results in a deployment.
Expected behaviour
Helm chart should spin up database tables
Actual behaviour
Web and sidekiq containers booting
Detailed description
No response
Specifications
Chart v 3.0.0, Tag: latest (v4.0.2) K8s
Have you tried restarting (deleting) the sidekiq and web pods? I've seen enough weird behavior similar to what you report on chart based deployments that I restart those two pods as a rule now after release. I have not seen failed migrations.
This was not a migration. This was a fresh install. (I'm trying out mastodon for the first time), which is I believe, why it was deadlocking.
Web container requires the DB tables to start -> DB tables require the DB job to have run atleast once -> DB job currently waits for the web container/sidekiq to be up to run -> ... (deadlock)
Side note, I think the jobs containers did not get upgraded to v4?
I am also talking about a fresh install. A db migration happens both on initial install and on upgrades.
This is what determines the version of the db migrate job container: https://github.com/mastodon/mastodon/blob/main/chart/templates/job-db-migrate.yaml#L46
Oh, I known what happened with the job's container version. I was testing with v3 to see if the helm chart using V3 would work. I exported to a file to dig around. Must have forgotten to do that before I pulled the jobs to run manually (which was my eventual solution to get mastodon running at all).
Regarding restarting, I guess deleting the pods may have caused the jobs to exist but I don't know. It feels really hacky to have that kind of problem, and I feel the original deadlock problem should still be addressed.
I feel the original deadlock problem should still be addressed.
I agree. I think the problem is that the web and sidekiq pods manage to come up partway through the migration, but don't quite end up in the right state. Restarting them after rake db:migration
is completed fixes it.
These lines here make an effort to order the job relative to other things:
https://github.com/mastodon/mastodon/blob/main/chart/templates/job-db-migrate.yaml#L8-L10
I've not thought about that vs the startup of the other containers.
I also experience this issue regularly. Makes it rather difficult to run helm from Terraform because an apply
always fails waiting for a db:migrate
job that will never complete on its own.
I face the same issue. I'm also wondering how I can trigger the run of the initialization of a fresh postgres instance?
the error log of the web container shows:
[1] Puma starting in cluster mode...
[1] * Puma version: 5.6.5 (ruby 3.0.4-p208) ("Birdie's Version")
[1] * Min threads: 5
[1] * Max threads: 5
[1] * Environment: production
[1] * Master PID: 1
[1] * Workers: 2
[1] * Restarts: (✔) hot (✖) phased
[1] * Preloading application
bundler: failed to load command: puma (/opt/mastodon/vendor/bundle/ruby/3.0.0/bin/puma)
[1] ! Unable to load application: ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "accounts" does not exist
LINE 8: WHERE a.attrelid = '"accounts"'::regclass
^
/opt/mastodon/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.7/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec': PG::UndefinedTable: ERROR: relation "accounts" does not exist (ActiveRecord::StatementInvalid)
LINE 8: WHERE a.attrelid = '"accounts"'::regclass
^
from /opt/mastodon/vendor/bundle/ruby/3.0.0/gems/activerecord-6.1.7/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `block (2 levels) in query'
...
the sidekiq one:
2023-01-03T13:28:50.933Z pid=1 tid=53x WARN: `config.options[:key] = value` is deprecated, use `config[:key] = value`: ["/opt/mastodon/lib/mastodon/redis_config.rb:38:in `<top (required)>'", "/opt/mastodon/config/application.rb:53:in `require_relative'"]
2023-01-03T13:28:51.135Z pid=1 tid=53x INFO: Booting Sidekiq 6.5.7 with Sidekiq::RedisConnection::RedisAdapter options {:driver=>:hiredis, :url=>"redis://:[email protected]:6379/0", :namespace=>nil}
2023-01-03T13:28:51.663Z pid=1 tid=53x WARN: ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "accounts" does not exist
LINE 8: WHERE a.attrelid = '"accounts"'::regclass
^
...
As a workaround you can:
- Run
helm template ... > workaround.yaml
- Edit the file to only contain the job
-
kubectl apply -f workaround.yaml
- Wait for the initial job to finish
-
kubectl delete -f workaround.yaml
This is not great but it should take care of the database initialization.
PR #37 tries to address both this and #26.