docker icon indicating copy to clipboard operation
docker copied to clipboard

Worker node not being added

Open mhaagens opened this issue 5 years ago • 10 comments

docker-compose.yml

version: "2.1"
services:
  master:
    build: ./docker/postgres
    image: 'citus:local'
    container_name: "${COMPOSE_PROJECT_NAME:-citus}_master"
    labels: ['com.citusdata.role=Master']
    restart: unless-stopped
    ports: ["${MASTER_EXTERNAL_PORT:-5432}:5432"]
    volumes:
      - ./docker/postgres/data/master:/var/lib/postgresql/data
    env_file:
      - ./.env
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
  worker:
    build: ./docker/postgres
    image: 'citus:local'
    labels: ['com.citusdata.role=Worker']
    restart: unless-stopped
    depends_on: { manager: { condition: service_healthy } }
    volumes:
      - ./docker/postgres/data/worker:/var/lib/postgresql/data
    env_file:
      - ./.env
  manager:
    image: 'citusdata/membership-manager:0.2.0'
    container_name: "${COMPOSE_PROJECT_NAME:-citus}_manager"
    volumes: ['/var/run/docker.sock:/var/run/docker.sock']
    depends_on: { master: { condition: service_healthy } }
    restart: unless-stopped
    env_file:
      - ./.env

Everything seems to work, except no active nodes are returned when running SELECT master_get_active_worker_nodes();

mhaagens avatar May 10 '19 22:05 mhaagens

This happened to me; master service was taking time to startup and manager service would fail to connect (from docker-compose logs, master server restarts multiple times due to configuration changes). Setting manager service to restart: on-failure fixed this. Manager is able to connect after 2,3 tries.

And now SELECT master_get_active_worker_nodes(); returns a worker node too.

mubaidr avatar Apr 07 '20 07:04 mubaidr

I created two PRs that aims to resolve this issue:

  1. https://github.com/citusdata/membership-manager/pull/9 implements a polling mechanism in Membership Manager, so that (a) it will detect the readiness of the coordinator node, (b) properly report that it is ready to accept new Citus worker services
  2. https://github.com/citusdata/docker/pull/187 (a) updates the docker images to properly detect dependencies, (b) introduce Compose V3 definitions

Once they are merged, this issue should be resolved

hanefi avatar Apr 13 '20 09:04 hanefi

I am currently using similar method to wait for db. i.e. using pg_healthcheck. In addition to that I was also waiting for the worker node to be added to manager before I run any schema scripts. But now I think this is redundant. Can you confirm if the a new worker joins the master it will be automatically updated (schema and data) by the master?

mubaidr avatar Apr 13 '20 09:04 mubaidr

^ got the answer. Need to wait for worker nodes to run reference or distributed table definitions. This means it would be a better idea to wait for at-least one worker node too?

Currently I am using this script to check:

# wait for worker nodes to be added to citus membership manager
while [ 0 == $(psql --username postgres --dbname ${POSTGRES_DB} --tuples-only --command "SELECT count(*) from master_get_active_worker_nodes();") ]; do
  sleep 3s
done

is there a better way?

mubaidr avatar Apr 14 '20 05:04 mubaidr

I think it is better to wait until all your worker nodes are registered, and ready to accept connections. If you distribute a table when only one worker node is active, your queries may be slower than expected due to the uneven distribution of your data

hanefi avatar Apr 14 '20 07:04 hanefi

Ok Cool. Thanks

Any future nodes will be added by membership-manager but we need to run following command to redistribute data for optimal performance.

SELECT rebalance_table_shards();

mubaidr avatar Apr 14 '20 07:04 mubaidr

I want to remind you that shard rebalancing is an enterprise feature and is not available in the docker setup.

hanefi avatar Apr 16 '20 15:04 hanefi

Well, surprised, did not know that.

mubaidr avatar Apr 16 '20 16:04 mubaidr

You can see https://www.citusdata.com/product/comparison for a comparison of features between Citus community, Citus enterprise and Citus on Azure

hanefi avatar Apr 17 '20 13:04 hanefi

I want to remind you that shard rebalancing is an enterprise feature and is not available in the docker setup.

Hi! Could I rebalance shards manually for playing with docker images without a citus enterprise license?

rdcm avatar Oct 16 '20 10:10 rdcm