rails
rails copied to clipboard
Deploy new applications with db:schema:load
Hello,
I see that someone requested this feature for a different reason and it was denied. I don't understand the reason for dismissal, as the source of truth for the database schema is the /db/schema.rb or /db/structure.sql file. Migrations are only designed to manage changes to the database and it seems a waste to run the entire migration history when deploying a new application. Also it should be possible to delete old migrations if an application has a great many.
There is no need (and it is error prone) to deploy a new instance of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema. - Rails guide
Any chance you would reconsider this position?
Just ran into this myself. Didn't notice before because I hadn't deployed any old applications on new servers lately. I think it's debatable whether it should attempt rake db:structure:load automatically. While it should just complain that everything already exists on an existing database, I'd be satisfied and feel safer with a manual option. Maybe cap deploy:new_database or cap deploy NEWDB=1 or something. Prepping it manually beforehand is a bit fiddly because you have to do cap deploy:updating and then SSH to the server to run rake db:structure:load yourself. Depending on your setup, this may require several additional steps in between to get the right environment.
P.S. @Odaeus, hello old friend. :wink:
Hello James!
I ended up adding the following to my deploy.rb:
namespace :deploy do
namespace :db do
desc "Load the database schema if needed"
task load: [:set_rails_env] do
on primary :db do
if not test(%Q[[ -e "#{shared_path.join(".schema_loaded")}" ]])
within release_path do
with rails_env: fetch(:rails_env) do
execute :rake, "db:schema:load"
execute :touch, shared_path.join(".schema_loaded")
end
end
end
end
end
end
before "deploy:migrate", "deploy:db:load"
end
I agree it should be a manual task with the option of making it automatic, given that it could cause issues with a production database. My example above uses a hidden file as a guard to make sure it only runs once.
Just saw this: https://stackoverflow.com/questions/35821777/rails-how-to-set-up-dbschemaload-for-initial-deploy-with-capistrano
Which made me look for this issue.
My thoughts: Rails points out in their docs that the schema file is the canonical way of setting up a new database. I see two common Capistrano cases:
- New app, new deployment. Capistrano just uses migrations and it works fine.
- Old app, new deployment. A clean chain of migrations may not exist all the way back to initial creation. Further, it may be error-prone to set up a database this way.
For any case where the database has no schema version information, it is probably more correct to run schema (or structure) load instead of running migrations.
I'm not sure how we might go about finding out whether the database has had the schema loaded yet, but it would ideal to query this and decide based on that which method to run.
If someone wants to flesh out conceptually how this might be done, I can possibly take a crack at a PR.
Typically (in my experience) the user that is running the Rails app is a non-privileged user that does not have permission to create databases. The initial step of creating the database is thus done outside of the Capistrano deploy, which probably explains why this not entirely straightforward.
Personally, I try to make sure migrations always work back to the beginning because you never know when you might need to restore an old database backup and migrate it. But, I completely understand that I'm in the minority and rake db:schema:load or db:structure:load is the better way to go. I'm just not sure what is the safe way to do it.
I'm not sure either. My usual case is that the database already exists, but is completely empty. schema:load is then run to set up the initial structure.
It would probably be ideal to be able to run cap env rails:schema:load or something like that. But that presumes a preexisting deploy, which wouldn't exist yet…
Maybe you could do a trick like:
bundle exec rails runner -e production "exit 1 unless ActiveRecord::SchemaMigration.table_exists?"
And put that inside a Capistrano test to see if we need to load the schema.
:arrow_up: That's an expensive test because it needs to load the Rails environment, so never mind.
We could modify the implementation of the deploy:migrate task to add some conditionals. It looks to see if a special :rails_load_schema variable is set to true (it will default to false). If so, it runs rake db:schema:load instead of rake db:migrate.
Then, we have a special rake task, load_schema, which all it does is call set :rails_load_schema, true. First-time deployment works like this:
cap production load_schema deploy
And then do something similar for db:structure:load.
This is solution is a bit out of the ordinary, though; I can't think of another Capistrano plugin that uses additional task names on the command line preceding deploy to affect conditional behavior.
Another option would be to make a special load_schema_and_deploy task that sets the variable and then does invoke :deploy.
Or use an environment variable:
CAP_RAILS_LOAD_SCHEMA=1 cap production deploy
I dunno, none of these approaches feels exactly right.
Brainstorming here:
What if in Capistrano proper, a task called deploy:first_time (or something similar) was created which only invokes deploy. Then other plugins could hook into that to have tasks which should only be run the first time a deploy is done.
This would then be documented as such.
Does this sound any better?
You still need the variable trick, though, which doesn't feel particularly elegant:
before "deploy:first_time", "rails:first_time" do
set :rails_load_schema, true
end
before "db:migrate", "db:load_schema" do
if fetch(:rails_load_schema)
# rake db:schema:load
end
end
Could you add a task in a task?
before "deploy:first_time", "rails:first_time" do
before "db:migrate", "db:setup"
end
task "db:setup" do
# rake db:schema:load
end
I haven't tried this.
Perhaps slightly better: deploy:first_time sets a special Capistrano variable that is exposed in the DSL as e.g. first_deploy?. Then tasks can do conditional logic based on that method without having to manage a special variable of their own.
This is sort of like the special dry_run? method proposed in https://github.com/capistrano/capistrano/issues/1564.
Could you add a task in a task?
Yes, I think that would work, too.
Just checking in, any formal solution/tasks within Capistrano for this? Having to ssh into server to load schama for cold starts on old apps does not seem right.
Just checking in, any formal solution/tasks within Capistrano for this?
No, not to my knowledge.
In the way-back machine, Capistrano had a "deploy:cold".
In Capistrano 3.10.1, bundle exec cap staging deploy:updating gives me enough to shell-in and run the db:structure:load or db:schema:load task manually. Shelling into a (successful or failed) deploy that has tried deploy:migrate isn't quite the same.
I definitely wouldn't like every deploy to consider wiping the database and then skip it based on a semaphor file. That feels way too dangerous to me.
You could run
cap production deploy COLD=1
And then in your deploy.rb you could customize the deployment process as:
task "deploy:db:load" do
on primary :db do
within release_path do
with rails_env: fetch(:rails_env) do
execute :rake, "db:schema:load"
end
end
end
end
before "deploy:migrate", "deploy:db:load" if ENV["COLD"]
Maybe?
Edit: I definitely wouldn't like every deploy to consider wiping the database and then skip it ~based on a semaphor file~. That feels way too dangerous to me.
I think all what we need here is to call rails db:create at the first deploy before db:migrate. Use db:schema:load is a bit dangerous.
Anyway, this issue should be resolved. It is an official rails tasks for Capistrano. Having the first deploy always to be failed (because database is not created) and going to the server to manually type rails db:create doesn't seems for me like automation.
Or at least add to the README info about the assuming fail at the first deploy, so the Capistrano newbies will not be so scared when it's happen.
I find that deploy:assets:precompile also fails on the first deploy as it runs before the database has been created.
Right now I don't want to automate db:create or db:schema:load. Neither is what I personally would want for real-world deployments. Plus from developers I've spoken with there is a pretty even split between those that prefer creating an empty db and migrating it vs those that prefer loading the schema file directly. There doesn't seem to be a standard way to do it.
It's OK that capistrano-rails leaves some things up to the individual developer. For example, we can't provide a deploy:restart task out of the box, because that depends on passenger, initd, systemd, or whatever each project is using to manage the server process. Likewise for the database, maybe I can rely on db:create, or maybe I need to go through AWS to provision an RDS instance. To draw an analogy to Heroku: you can't just git push heroku master the first time without manually configuring the PostgreSQL database add-on first. I don't think that's an unreasonable step.
Or at least add to the README info about the assuming fail at the first deploy, so the Capistrano newbies will not be so scared when it's happen.
Agreed. @vifreefly would you be interested in opening a PR for that?
@mattbrictson Thanks, I see your point.
In this case, we can probably add another task deploy:db_create (or something like this) which will execute rake db:create.
This task will not be executed by default, but anyone (if it's ok for them) will be able to simply add the hook before 'deploy:migrate', 'deploy:db_create', and have database created automatically before deploy:migrate if it's not exists.
In case if db exists, this task will be executed successfully anyway, because Rails return message like Database 'app_production' already exists with successful (0) status code.
About db:migrate vs db:schema:load to call for a first time on empty db.
Maybe db:schema:load is more true but I personally had some issues with it and custom Postgres extensions (for example https://github.com/Casecommons/pg_search). With db:schema:load some required extensions described in migrations wasn't enabled in the new database.
On the other hand, with db:migrate all was fine. I think db:migrate fine as well, unless if you're have a several hundreds/thousands of migrations to migrate and wait.
With
db:schema:loadsome required extensions described in migrations wasn't enabled in the new database.
Don't describe required extensions in migrations?
Don't describe required extensions in migrations?
Ok, what is a right way then?
Ouch. Pardon me. It really was a question. It probably sounded a little glib. Excuse me. And also this is a little orthoganal to the thread. Maybe we should take it offline?
Now I've read the doc for the extension you referenced and see that they suggest exactly that, using a migration. Indeed, on a fresh install, those execute "CREATE EXTENSION IF NOT EXISTS pg_trgm;" will not have run from a db:structure:load. That doc doesn't give consideration to booting up a fresh DB without running the migration they suggest. I'm not sure it's great advice, only convenient advice.
The right way? Wow. Would it be great if the structure dump captured those? Assuming they don't (and that might not be possible as it might not be standardized). Would it serve to work it into the database install and provisioning of the DB box? The doc does go on to say that the extension installation methods vary across OS's.
Please excuse me for a. pouncing on that and b. diverting this thread. The rails guide advice quoted in the original issue description: I think it's solid advice.
Hi, So, no solution for this case then? (Read through the thread but maybe I missed something...)
@mulaiko you have solutions in the thread which you can adapt. Here is my current version, using an explicit RUN_DB_SCHEMA_LOAD flag at first deploy:
# adapted from https://gist.github.com/wacaw/657744af41dcf4963646
namespace :deploy do
desc "Load the initial schema - it will WIPE your database, use with care"
task :db_schema_load do
on roles(:db) do
puts <<~WARN
************************** WARNING ***************************
If you type [YES], rake db:schema:load will WIPE your database
any other input will cancel the operation.
**************************************************************
WARN
ask :answer, "Are you sure you want to WIPE your database?: "
if fetch(:answer) == 'YES'
within release_path do
with rails_env: fetch(:rails_env) do
execute :rake, 'db:schema:load'
end
end
else
puts "Cancelled."
exit
end
end
end
end
if ENV["RUN_DB_SCHEMA_LOAD"] == '1'
before "deploy:migrate", "deploy:db_schema_load"
end
Hope this helps!
with rails_env: fetch(:rails_env) do execute :rake, 'db:schema:load' end
I had to change the line to this in production:
execute :rake, 'db:schema:load DISABLE_DATABASE_ENVIRONMENT_CHECK=1'