citus_docs icon indicating copy to clipboard operation
citus_docs copied to clipboard

master_disable_node documentation is not correct

Open samay-sharma opened this issue 8 years ago • 6 comments
trafficstars

Currently, master_disable_node docs describe it as:

The master_disable_node function is the opposite of master_activate_node. It marks a node as inactive in the Citus metadata table pg_dist_node, removing it from the cluster temporarily. The function also deletes all reference table placements from the disabled node. To reactivate the node, just run master_activate_node again.

The node is not "removed" from the cluster temporarily. INSERTs and SELECTs can still use the node. It removes reference table copies from the node and avoids creation of new shards on that node.

We should update the docs to represent this information more accurately.

samay-sharma avatar Sep 27 '17 21:09 samay-sharma

It's not clear to me the node is available for reads/writes. See https://github.com/citusdata/citus/blob/632d0c675a59bfa3d052f993b8386c9243ca26ab/src/backend/distributed/executor/multi_task_tracker_executor.c#L204. In another call, it appears concurrent index creation also may not work. Do you know if this is intentional?

sumedhpathak avatar Nov 02 '17 17:11 sumedhpathak

@sumedhpathak : While I was testing this function for a customer, I was able to run reads on a disabled node. I think I also tried inserts and they worked. @byucesoy mentioned that the main goal for this function was to not replicate reference tables or add new shards and not to stop reads / writes on existing shards. He mentioned that this function was added to make sure adding nodes was a quick operation on Cloud.

@byucesoy Could you clarify on the behavior of this function?

samay-sharma avatar Nov 07 '17 01:11 samay-sharma

There is slight misunderstanding here. We added master_add_inactive_node UDF to ensure adding nodes is quick operation on Cloud. master_disable_node is not related with that.

Apart from that, when we introduced pg_dist_node metadata table (and removed pg_worker_list.conf) we implicitly removed a feature; it seems some customers performs some tests/upgrades/maintenance etc. by removing a line from pg_worker_list.conf file, thus preventing any real-time and task-tracker queries to hit that particular node. Of course removing that line would not prevent router queries but I think that was good enough for the customers. So when we introduced pg_dist_node metadata table, it was no longer possible to do that and it was requested to add a way to replicate that behaviour. So we implemented master_disable_node UDF.

When you disable a node; expected behaviour is;

  • real-time/task-tracker queries will not hit disabled node
  • no shards will be created at disabled node
  • router queries will run as expected even if they hit disabled node

Here is the issue about implementing this UDF (#931). There is also this documentation issue (#224)

In documentation issue @mtuncer says that "Removing a node from cluster stops all communication to that node." As far as I know, router queries still works when you disable a node. Maybe @mtuncer can say more about it.

byucesoy avatar Nov 07 '17 14:11 byucesoy

If Router queries can still hit the node, does that also include INSERT's/UPDATEs etc? If so, then it defeats the purpose of disable node? This seems like usable for append-only workloads, which would make sense if it tries to replicate previous functionality.

sumedhpathak avatar Nov 14 '17 18:11 sumedhpathak

So what use-case does master_disable_node support? It's for customers to do maintenance on a node? Is this safe given that router queries can still hit the node?

jonels-msft avatar Oct 29 '19 18:10 jonels-msft

Personally, I'm not sure if anyone uses this UDF. It is implemented after some(?) customers asked for it and mentioned the use-case above. The behavior is equivalent to removing the line from pg_worker_list.conf, though I'm not sure how useful is this.

byucesoy avatar Oct 30 '19 10:10 byucesoy