Auto TRUNCATE when distributing a table.
Currently we retain the local data after distributing a table. This blocks fixing some of the issues like #2600 and #1770. However, auto-TRUNCATE when distributing a table isn't straightforward, since it might be referenced from other local tables, and in this case (1) TRUNCATE cascade will lead into data loss, (2) TRUNCATE no-cascade might fail (because of foreign key constraint violation).
We can approach this problem in multiple ways.
-
~~When distributing a table, fail if truncating no-cascade of local data fails. This will force users to topologically sort their table distribution statements. If there are any cyclic references, users should have already defined the constraints as deferrable anyway, so this case isn't a problem too.~~ this doesn't work since TRUNCATE doesn't work for tables with incoming FK constraints (regardless of whether deleting all rows will violate the constraint or not), unless all truncated in the same command.
-
Defer TRUNCATE to the commit time. If we do this, users can distribute their tables in a transaction block, without worrying about the order of distribution statements.
~~Advantage of 2 is that makes life a bit easier for users when distributing tables, but it also has the disadvantage that if the order of distribution statements doesn't follow the reference graph, then between statements we can have foreign keys from local tables to distributed tables which, and if we query the data at that point we might get some weird error messages (like "Consistency check on SPI tuple count failed").~~
~~Approach 1 reduces the probability of these error messages by forcing users to follow the dependency graph, but doesn't eliminate them completely, e.g. when the constraints are deferable.~~
~~My vote is to implement 1 with proper error messages to help users find the correct order as it is a bit easier to implement, and approach 2 doesn't seem to have a great advantage over it.~~
I'm creating this thread so others can propose ideas and comment on things I've missed in my analysis.
Also note that TRUNCATE acquires relation lock in ACCESS EXCLUSIVE mode, so we might need to upgrade the lock mode in create_distributed_table() from EXCLUSIVE to ACCESS EXCLUSIVE.
Hmmmm, it seems that approach 1 is not feasible at all, since TRUNCATE doesn't actually check FK constraints, but just doesn't work for tables with incoming FK references, unless all are truncated in the same command (or we use CASCADE, which we want to avoid here).
I need to think more about this issue.
#3752 introduced a new UDF truncate_local_data_after_distributing_table() that creates an easy workaround to this issue. This UDF runs a TRUNCATE .. CASCADE on the local copies provided that:
- the parameter is a Citus table
- there are no foreign key references to the given table
That means, we may end up truncating the local data of other distributed tables as well, but the user should not lose any data.
I had customer run into this when creating a unique constraint, and getting an error of duplicate values existing which were only in the shell table not in the actual shards. Which was quite confusing to debug (both for them and for me). @hanefi any reason why we don't run this function automatically after distributing the data?
Marco was against the idea of truncating shell tables after distributing them. I do mot fully remember, but here is a list of items I can come up with today:
- During a POC, one may want to remove metadata and shards manually and try another distribution strategy. Not truncating shell tables allows this.
- After evaluating Citus and deciding not to use this extension, some users delete the extension and assume they still have access to their data.
Citus has evolved since I implemented this feature. We can undistribute tables now, and there are other ways to test different sharding strategies today. In the past it was not really that easy.
The second argument is also not really important IMO. If a user makes mistakes and drop shard tables, the blame is on them and not on the extension.
Yeah, both of those seem quite exotic use cases these days. Definitely not worth these weird and difficult to debug issues. In that case I think it would be good to simply put this on the backlog.
Another customer ran into this footgun again.