citus icon indicating copy to clipboard operation
citus copied to clipboard

Citus doesn't delete shard clean-up records for the node when removing it

Open evemorgen opened this issue 1 year ago • 2 comments

Hi!

Some time ago we lost one of the worker nodes due to hardware failure, we managed to remove the node and rebalance the cluster, however, we are now experiencing degraded cluster performance and share access logs/exclusive access logs piling up. We managed to isolate that ongoing issue to one specific shard and tried copying it over to other worker node. That has failed with the following error message.

ERROR:  shard move failed as the orphaned shard public.match_potentialmatch_102008 leftover from the previous move could not be cleaned up

It still contains ~15GB of data

db=# SELECT shardid, table_name, shard_size
FROM citus_shards where shardid = 102008;
 shardid |      table_name      | shard_size
---------+----------------------+-------------
  102008 | match_potentialmatch | 14195769344
(1 row)

We can also see this shard is scheduled for cleanup. It seems to still be associated with the node_group that was lost due to hardware failture and no longer exist in the pg_dist_node table.

db=# select * from pg_dist_cleanup limit 100;
 record_id | operation_id | object_type |            object_name             | node_group_id | policy_type
-----------+--------------+-------------+------------------------------------+---------------+-------------
       315 |           53 |           1 | public.match_potentialmatch_102008 |             4 |           2
(1 row)
Screenshot 2023-11-09 at 11 20 24 AM

We couldn't find an answer in the documentation what to do is such situation. Any assistance would be appreciated. Thanks!

evemorgen avatar Nov 09 '23 11:11 evemorgen

Which Citus version is this? Also how did you remove the node? With citus_remove_node('10.x.y.z', 5432)?

It would be a good thing if citus_remove_node would clean those records up automatically (i.e. the node being removed means nothing needs to be cleaned up there anymore). It's very possible that we don't do that though.

As a workaround you could remove all rows from pg_dist_cleanup with node_group_id=4 yourself. That should make the error you're hitting go away.

JelteF avatar Nov 09 '23 15:11 JelteF

Which Citus version is this? Also how did you remove the node? With citus_remove_node('10.x.y.z', 5432)?

We're running citus 12 in that cluster. I do believe we removed it with citus_remove_node('10.x.y.z', 5432), I'm not sure if we didn't have to disable it with citus_disable_node because citus was complaining about node still having shards assigned.

db=# select citus_version();
                                           citus_version
----------------------------------------------------------------------------------------------------
 Citus 12.0.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
(1 row)

As a workaround you could remove all rows from pg_dist_cleanup with node_group_id=4 yourself. That should make the error you're hitting go away.

Lovely, thank you. We will try that.

evemorgen avatar Nov 10 '23 10:11 evemorgen