Removing existing shards should be easier
We need a function for dropping all shards of a particular table. Otherwise the process of dismantling a sharded table in order to recreate it can be quite complex.
Two quick questions to help define this feature:
- In the ideal implementation of this feature, would the "prototype" table on the master also be dropped, or just its shards?
- Would the user experience be better by intercepting
DROP TABLEcommands against distributed tables, or would that be misleading (i.e. should we just have a custom function)?
Jason,
From an ideal perspective, we would not include dropping the master table. One of the main reasons to drop the shards is so that you can make a change to the master table because you forgot something before you created the shards. It would also parallel how a master is created. Thus you'd have create table --> create master --> create shards --> drop shards --> drop master --> drop table. Obviously, we'll eventually need to implement a way to modify sharded tables as well.
And yes, you should block dropping the master table unless it's been rendered not a master. From experiment, if you drop the master table it gets you in a state which isn't easily resolved short of dropping all of the databases involved.
Got it. You don't want the master table dropped, so intercepting DROP TABLE is not ideal. Just a function to drop the shards (and their ephemera: sequences, constraints, etc.) from workers and remove shard records from the master's metadata.
You, you mean automatically intercepting DROP TABLE and dropping the shards instead? No, I don't want to do that. It would be tricky, and could be a huge foot-gun. Having multiple steps is better, so that users realize that they are deleting all data on all shards.
Just as a note, this also came up in our conversations with Neustar.
This issue was moved to citusdata/citus#119