citus icon indicating copy to clipboard operation
citus copied to clipboard

limit citus_drain_node to drain the specified node only

Open mtuncer opened this issue 3 years ago • 1 comments

citus_drain_node tries to move data from multiple workers which have their shouldhaveshards value set to false.

this behavior is similar to rebalance_table_shards(drain_only: true) .

However, user explicitly wants to drain a single node, operation should be limited to that node only.

Scenario is

  • User wants to scale down their cluster with more than one node (say from 8 nodes to 5 nodes)
  • draining one by one is not optimal since, it will also move data to nodes that are going to be removed later
  • user sets shouldhaveshards value in pg_dist_node to false for nodes to be removed
  • user starts citus_drain_node in a single worker, hoping it would drain a single node in their allocated maintenance window
  • however citus tries to move data out from all 3 nodes, causing operation to take longer and not finishing on the maintenance window

Our documentation at https://docs.citusdata.com/en/v11.0/develop/api_udf.html#citus-drain-node does not mention this behavior, even hints that it moves data out from a specified (aka single) worker. However reality is different.

mtuncer avatar Aug 29 '22 12:08 mtuncer

A second possible improvement that this issue highlights is that for draining nodes in general it would likely be optimal to do them one-by-one (even when draining them all at once).

JelteF avatar Aug 29 '22 12:08 JelteF