citus icon indicating copy to clipboard operation
citus copied to clipboard

ERROR: there is a shard placement in node group 2 but there are no nodes in that group

Open shankarmn94 opened this issue 1 year ago • 1 comments

i had 8 machine cluster where 1 co-ordinator node and 7 workers.

with db name ccnsapp we created citus extension and added nodes.

after somedays of operations. where some specific tables started not to respond. while we started to remove the node one by one and and added back, we removed 5 nodes and added back while worker3 which started issue with this tables..

so we left it rebalance for week.. no response..

we tried to drain the node.. it took a week to move 1 shard and 2 shards in a weekend.. later we didnt receive any response of moving shard or drain.. if we initiate these drain node or move shardthe query will be running as moving and in progress also.. it will be like settingup table.. after a it started to as copying data and was stuck for week..

there was issue with the network of that node we deleted all the shard id's related to that and removed the node....

after that the specific tables which are not responding started to respond..

but while we try to create new distribute tables.. we face these issue.

how to resolve this or retireve this...

ccnsapp=# select * from pg_dist_shard_placement; ccnsapp=# ccnsapp=# ccnsapp=# SELECT create_reference_table('service_ports'); ERROR: table "service_ports" is already distributed ccnsapp=# ccnsapp=# SELECT create_distributed_table('asset_stats', 'tenantid', colocate_with => 'companies'); ERROR: reference table "service_ports" does not have a shard ccnsapp=# ccnsapp=# SELECT undistribute_table('service_ports'); NOTICE: creating a new table for public.service_ports NOTICE: moving the data of public.service_ports NOTICE: dropping the old public.service_ports ERROR: there is a shard placement in node group 2 but there are no nodes in that group CONTEXT: while executing command on 10.104.0.9:5432 SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" PL/pgSQL function citus_drop_trigger() line 13 at PERFORM SQL statement "DROP TABLE public.service_ports CASCADE" ccnsapp=# ccnsapp=# select * from pg_dist_node; nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards --------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------ 1 | 0 | 10.104.0.3 | 5432 | default | t | t | primary | default | t | f 2 | 1 | 10.104.0.4 | 5432 | default | t | t | primary | default | t | t 15 | 14 | 10.104.0.11 | 5432 | default | t | t | primary | default | t | t 16 | 15 | 10.104.0.12 | 5432 | default | t | t | primary | default | t | t 6 | 5 | 10.104.0.10 | 5432 | default | t | t | primary | default | t | t 5 | 4 | 10.104.0.9 | 5432 | default | t | t | primary | default | t | t 4 | 3 | 10.104.0.6 | 5432 | default | t | t | primary | default | t | t (7 rows)

ccnsapp=# ccnsapp=# SELECT undistribute_table('public.service_ports'); NOTICE: creating a new table for public.service_ports NOTICE: moving the data of public.service_ports NOTICE: dropping the old public.service_ports ERROR: there is a shard placement in node group 2 but there are no nodes in that group CONTEXT: while executing command on 10.104.0.9:5432 SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" PL/pgSQL function citus_drop_trigger() line 13 at PERFORM SQL statement "DROP TABLE public.service_ports CASCADE" ccnsapp=# CREATE TABLE temp_service_ports AS SELECT * FROM service_ports LIMIT 0; SELECT 0 ccnsapp=# INSERT INTO temp_service_ports SELECT * FROM service_ports; INSERT 0 0 ccnsapp=# DROP TABLE service_ports; ERROR: there is a shard placement in node group 2 but there are no nodes in that group CONTEXT: while executing command on 10.104.0.9:5432 SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" PL/pgSQL function citus_drop_trigger() line 13 at PERFORM ccnsapp=# SELECT citus_removedistributedtablemetadata('public.service_ports'); ERROR: function citus_removedistributedtablemetadata(unknown) does not exist LINE 1: SELECT citus_removedistributedtablemetadata('public.service_... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. ccnsapp=# ccnsapp=# DROP TABLE IF EXISTS service_ports CASCADE; ERROR: there is a shard placement in node group 2 but there are no nodes in that group CONTEXT: while executing command on 10.104.0.9:5432 SQL statement "SELECT master_remove_distributed_table_metadata_from_workers(v_obj.objid, v_obj.schema_name, v_obj.object_name)" PL/pgSQL function citus_drop_trigger() line 13 at PERFORM

shankarmn94 avatar Jan 29 '24 09:01 shankarmn94

Could you enable backtrace_functions GUC for LookupNodeForGroup() so that we can understand what needs to be done to make "undistribute_table()" working again.

onurctirtir avatar Jan 29 '24 13:01 onurctirtir