activerecord-multi-tenant icon indicating copy to clipboard operation
activerecord-multi-tenant copied to clipboard

Automate the Citus setup by enhancing the Rails rake tasks

Open AlexVPopov opened this issue 8 years ago • 5 comments

So currently, in order for Citus to work, one needs to do the following steps:

  • install the Citus package;
  • create n Postgres clusters (1 coordinator + (n-1) workers);
  • tell each cluster about Citus by adding "shared_preload_libraries = 'citus'" to its respective postgresql.conf
  • create the extension in each cluster;
  • tell the coordinator about the workers.

So basically, I've described the steps from the guide for OS X. Now, the problem with this is that one cannot just reset (drop/create) the db anymore via rake db:reset, as these steps will be lost. Even worse, when running rake db:migrate RAILS_ENV=test, is dumps a version of the schema, which doesn't contain the Citus extension and thus one could accidentally commit it.

So my idea is - can't we augment the rake tasks for interacting with the DB in such a way, that some of the steps, described above, happen automatically. For example, in the project I am currently working on, I've added the following rake task:

namespace :db do
  desc "Enables Citus extension in the database"
  task enable_citus: :environment do
    ActiveRecord::Base.connection.execute('CREATE EXTENSION citus;')
  end
end

Rake::Task['db:create'].enhance do
  Rake::Task['db:enable_citus'].invoke
end

Now, I don't observe the described side-effect of running migrations in the test database anymore.

What do you think of making similar things part of the gem? For example, I was thinking of adding the ports for the coordinator and workers as some kind of configuration to Rails and then also enhancing rake db:create with something, which executes "SELECT * from master_add_node('localhost', port_number);" for each port.

AlexVPopov avatar Aug 03 '17 12:08 AlexVPopov

@AlexVPopov I like the idea of enhancing the gem a bit to support this better.

My only ask for this would be that we make sure the Docker-based approach (i.e. by using a docker-compose.yml as in https://github.com/citusdata/citus-example-ad-analytics/blob/master/docker-compose.yml) still works fine - i.e. in this case we should make sure to use CREATE EXTENSION IF NOT EXISTS citus to not error out when it was already created.

Also we'll need to have logic to create the database on the worker nodes - again I would aim to support both Docker-based and non-Docker-based approaches here.

lfittl avatar Aug 03 '17 16:08 lfittl

Hey, @lfittl, today I spoke with @ozgune in Slack and he explained to me why you've moved to supporting primarily Docker and stopped supporting OS X. My main concern for wanting to be able to install locally on OS X, was to be able to persist modifications to the database, in case I migrate it and insert data. But, on a second thought, in a well prepared Rails project, one should always be able to run rake db:reset and recreate the DB from scratch and fill it up with seed data. So I don't mind using Docker and I will review again, if this whole approach makes sense with Docker.

AlexVPopov avatar Aug 03 '17 20:08 AlexVPopov

@AlexVPopov That makes sense, and thanks for working through that!

One issue with the Docker-based approach right now is that we don't clearly separate test and development databases (i.e. the Docker setup only initializes one databases, but for Rails you typically want two), so what you were talking about here could be interesting nonetheless.

Let me know how it goes :)

lfittl avatar Aug 03 '17 20:08 lfittl

Hey, @lfittl, so I summarized the steps I need in order to properly setup Citus in development and for tests:

  • Have postgres clusters running on two machines - citus_master and citus_worker_1 with Docker;
  • Created the databases app_development and app_test on both clusters:
    • On master via rake db:create;
    • On worker - manually via createdb for each of the two databases;
      • docker exec -it citus_worker_1 createdb -U postgres oscar_development
      • docker exec -it citus_worker_1 createdb -U postgres oscar_test
  • Create the Citus extension in each of the four databases via CREATE EXTENSION IF NOT EXISTS citus;
    • docker exec -it citus_worker_1 psql -U postgres -d oscar_development -c "CREATE EXTENSION IF NOT EXISTS citus;"
    • docker exec -it citus_worker_1 psql -U postgres -d oscar_test -c "CREATE EXTENSION IF NOT EXISTS citus;"
  • On master, for both app_development and app_test, add the worker node;
    • psql -h localhost -p 5432 -U postgres -d oscar_development -c "SELECT * from master_add_node('citus_worker_1', 5432);"
    • psql -h localhost -p 5432 -U postgres -d oscar_test -c "SELECT * from master_add_node('citus_worker_1', 5432);"
  • Run rake db:migrate

So, regarding this, I think the following steps need to be made.

  1. Create rake tasks for each of these steps.
  2. Make these tasks more flexible, by parameterizing various parts of the commands (worker host/port etc., etc.)

Also, there is this issue, which I also mentioned in the chat - the "normal" way of setting up a DB in Rails is via rake db:structure:load and not via rake db:migrate. However, since structure.sql does not contain the special Citus methods, like create_distributed_table, that are included in the migrations, one ends up with empty meta-tables on the coordinator. So I was thinking - may also the rake db:structure:dump needs to be enhanced, so that it also dumps Citus' meta tables and their contents?

AlexVPopov avatar Aug 08 '17 12:08 AlexVPopov

We have found it is easy to solve these issues with a somewhat modified docker configuration and by overriding the rake db:setup / db:test:prepare tasks.

The Postgres docker image supports adding scripts to a docker-entrypoint-initdb.d directory, which can be done with volumes. These scripts will be run when the database is initialized so we use them to create test databases on all workers. We also inject /etc/citus/pg_worker_list.conf into the image instead of using citusdata/workerlist-gen, which we found to cause issues with our CI environment.

# in docker-compose.yml
# Citus sharded postgres - launch master and 2 workers
  citus_master:
    image: 'citusdata/citus:6.2'
    ports:
      - '5444:5432' # Used to allow connecting from outside Docker (db will be on port 5444)
    volumes:
      - /var/run/postgresql
      - ./docker-database-volumes/etc/citus/pg_worker_list.conf:/etc/citus/pg_worker_list.conf
      - ./docker-database-volumes/docker-entrypoint-initdb.d/100_create_additional_databases.sql:/docker-entrypoint-initdb.d/100_create_additional_databases.sql
    environment:
      - POSTGRES_DB=app_development
      - POSTGRES_USER=app
    links:
      - citus_worker_1
      - citus_worker_2

  citus_worker_1:
    image: 'citusdata/citus:6.2'
    volumes:
      - ./docker-database-volumes/docker-entrypoint-initdb.d/100_create_additional_databases.sql:/docker-entrypoint-initdb.d/100_create_additional_databases.sql
    environment:
      - POSTGRES_DB=app_development
      - POSTGRES_USER=app

  citus_worker_2:
    image: 'citusdata/citus:6.2'
    volumes:
      - ./docker-database-volumes/docker-entrypoint-initdb.d/100_create_additional_databases.sql:/docker-entrypoint-initdb.d/100_create_additional_databases.sql
    environment:
      - POSTGRES_DB=app_development
      - POSTGRES_USER=app

This will set you up with citus and the app_development precreated on the coordinator and workers

Our script 100_create_additional_databases.sql handles creating the app_test database:

CREATE DATABASE app_test OWNER app;

Our pg_worker_list.conf looks like this:

citus_worker_1 5432
citus_worker_2 5432

Additionally, we have overridden the rake db tasks to always run rake db:migrate instead of db:schema:load and to skip db:create (as the database is always created by the docker scripts.

# lib/tasks/db.rake
Rake::Task['db:setup'].clear
Rake::Task['db:test:prepare'].clear

namespace :db do
  namespace :test do
    desc "Overridden version of rails' standard db:test:prepare task since the schema dump used does not handle Citus"
    task prepare: [:environment] do
      system('rake db:migrate RAILS_ENV=test')
    end
  end

  desc "Overridden version of rails' standard db:setup task since the schema dump used does not handle Citus"
  task setup: [:environment, 'db:migrate', 'db:test:prepare']
end

Hope this helps!

webandtech avatar Aug 08 '17 17:08 webandtech

@webandtech we refreshed the Docker compose file. You can easily start testing just by executing below command in the project

docker-compose up -d

gurkanindibay avatar May 03 '23 15:05 gurkanindibay