citus icon indicating copy to clipboard operation
citus copied to clipboard

how to forcefully drop/remove a node with citus.

Open shankarmn94 opened this issue 1 year ago • 3 comments

in my 8 node citus cluster, where one of the node is giving the problem of loading or getting data from specific tables,

so we tried the following,

as i mentioned in https://github.com/citusdata/citus/issues/7424

we removed the nodes one by one from the last and added back so we can check if the shards on that node causing the problem, when we tried to remove worker-3 we are stuck with moving shards, create tables on cluster db. when i tried to drain this node it took 2 days to move a shards.. while i keep trying it moved 2 shards later..

so this was taking time i tried to remove the node directly..

ccnsapp=# select citus_remove_node('10.104.0.5',5432); ERROR: cannot remove or disable the node 10.104.0.5:5432 because because it contains the only shard placement for shard 102017 DETAIL: One of the table(s) that prevents the operation complete successfully is public.companies HINT: To proceed, either drop the tables or use undistribute_table() function to convert them to local tables

so i started to move this shard to differnt node.. SELECT citus_move_shard_placement( 102017, '10.104.0.5', 5432,'10.104.0.10', 5432);

And am trying this from two days.. where this will be stuck or hung am not able to understand..

here when i see the progress

sessionid | table_name | shardid | shard_size | sourcename | sourceport | targetname | targetport | progress | source_shard_size | target_shard_size | operation_type | source_lsn | target_lsn | status
-----------+------------------------------------+---------+------------+------------+------------+-------------+------------+----------+-------------------+-------------------+----------------+--------------+------------+------------ 201523 | companies | 102017 | 155648 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 155648 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | agents | 102049 | 18087936 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 18087936 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | assets | 102081 | 121077760 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 121077760 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | deprecation_status | 102113 | 32768 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 32768 | 0 | move | 8FE/751164B8 | | Setting Up 201523 | windows_specifics | 102145 | 32768 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 32768 | 0 | move | 8FE/751164B8 | | Setting Up 201523 | mac_specifics | 102177 | 40960 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 40960 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | linux_specifics | 102209 | 32768 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 32768 | 0 | move | 8FE/751164B8 | | Setting Up 201523 | network_devices_specifics | 102241 | 32768 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 32768 | 0 | move | 8FE/751164B8 | | Setting Up 201523 | asset_windows_reboot_required | 102305 | 73728 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 73728 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | asset_windows_security_products | 102337 | 2760704 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 2760704 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | asset_firewall_rules | 102369 | 576430080 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 576430080 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | asset_unqouted_services | 102401 | 688128 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 688128 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | asset_msdt | 102433 | 1089536 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 1089536 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | asset_registry_misconfiguration | 102465 | 27328512 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 27328512 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | remediated | 102497 | 802816 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 802816 | 8192 | move | 8FE/751164B8 | | Setting Up 201523 | software | 102529 | 119169024 | 10.104.0.5 | 5432 | 10.104.0.10 | 5432 | 1 | 119169024 | 8192 | move | 8FE/751164B8 | | Setting Up

... ... (116 rows)

how long it will take.. this says setting up setting up i dono when this will stop..

could some one let me know whats happening over here.

shankarmn94 avatar Jan 25 '24 13:01 shankarmn94

I also would like to know how to remove a node? The physical node isn't running anymore, and it's giving me the error:

ERROR: cannot remove or disable the node <name>:5432 because it contains the only shard placement for shard 102073
Detail: One of the table(s) that prevents the operation complete successfully is <table>
Hint: To proceed, either drop the tables or use undistribute_table() function to convert them to local tables

I can't undistribute that table as the node isn't running anymore, and I can't remove the node because it's not running. I need to remove that table and only one worker is running, how I can I remove the table?

cyraid avatar May 28 '24 00:05 cyraid