citus
citus copied to clipboard
Load data to a table but different sum shards size in worker nodes
Hi Team,
We load data to 1 table, but sum shards size are different in worker nodes
ALTER TABLE ONLY distribute_table ADD CONSTRAINT distribute_table_pkey PRIMARY KEY (tenant_uuid,id);
SELECT create_distributed_table('distribute_table', 'tenant_uuid');
psql -h localhost -p 5432 -d load15_taskmanager -U tm_dflt_user -f /opt/task_manager_dump/distribute_table_data.sql
SELECT nodename,sum(shard_size) FROM citus_shards where table_name='distribute_table'::regclass group by nodename;
nodename | sum
-------------------------+-------------
citus_vm_02 | 18296504320
citus_vm_03 | 5631139840
citus_vm_04 | 5836931072
citus_vm_05 | 5470191616
(4 rows)
SELECT nodename,count(*) FROM citus_shards where table_name='tasks_export'::regclass group by nodename;
nodename | count
-------------------------+-------
taskmanager_citus_vm_02 | 16
taskmanager_citus_vm_03 | 16
taskmanager_citus_vm_04 | 16
taskmanager_citus_vm_05 | 16
(4 rows)
The shard count is same in each work node, so it is because that some tenant have more records than others,
Please help answer: 1.We know that rebalance can balance size, but just it is difficult for storage planning, for example we want to migrate normal postgresql 400GB to 4 worker nodes citusdb, plan 100GB per worker node, is there any solution can load data have same sum shard size in worker nodes?
2.If question 1 no solution: 2.1. If every time is the first work node have the biggest sum shards size? 2.2. If question 2.1 answer is yes, what biggest size percentage of total size will be in the first work node, 50% 200GB or other value? 2.3. If question 2.1 answer is yes, what biggest size percentage of total size will be in other work nodes?
Hi Team,
Could you please help check? Thanks