citus icon indicating copy to clipboard operation
citus copied to clipboard

Cluster block query when one node down.

Open iamruhua opened this issue 3 years ago • 1 comments

I follow the instruction to create a citus cluster with 5 worker and set the replication factor to 3. (With docker-compose -p citus up and scaled the worker to 5) Then created a distributed test_table as described in the official online document. Inserted 100k data to the database. I verifyed the shards are distributed by using SELECT * FROM pg_dist_shard_placement, and successfully ran the select * from master server.

Then I was trying to test the high availability, so I shutdown one worker1 and tried the same query select * from test_table.

The expected result will be 100000 returned immediately, since there's a high availability natively built in. The actual result is, the query will be blocked all the way to the expiration. Unless I restart the "failed" node during the blocking time.

I was expecting the cluster to automatically redirect the query to other available worker nodes with the same shards stored in worker 1.Is there anything else I have to do beyond using the "official docker images and instruction"

iamruhua avatar Aug 16 '22 03:08 iamruhua

In the document, there's a section mentioning "Coordinator Node Failures" and "Worker Node Failures". It seems I need to create standby nodes for worker and coordinator.

  1. Is the coordinator node equal to the docker master node?
  2. From my point of view, in documents Citus 11 has sharding+replication+"Query from any node" functions, am I wrong?
  3. Why we still need the hot standby besides the resource reservation?

iamruhua avatar Aug 16 '22 03:08 iamruhua