Cluster block query when one node down.
I follow the instruction to create a citus cluster with 5 worker and set the replication factor to 3. (With docker-compose -p citus up and scaled the worker to 5) Then created a distributed test_table as described in the official online document. Inserted 100k data to the database. I verifyed the shards are distributed by using SELECT * FROM pg_dist_shard_placement, and successfully ran the select * from master server.
Then I was trying to test the high availability, so I shutdown one worker1 and tried the same query select * from test_table.
The expected result will be 100000 returned immediately, since there's a high availability natively built in. The actual result is, the query will be blocked all the way to the expiration. Unless I restart the "failed" node during the blocking time.
I was expecting the cluster to automatically redirect the query to other available worker nodes with the same shards stored in worker 1.Is there anything else I have to do beyond using the "official docker images and instruction"
In the document, there's a section mentioning "Coordinator Node Failures" and "Worker Node Failures". It seems I need to create standby nodes for worker and coordinator.
- Is the coordinator node equal to the docker master node?
- From my point of view, in documents Citus 11 has sharding+replication+"Query from any node" functions, am I wrong?
- Why we still need the hot standby besides the resource reservation?