Hard Delete Table Option
With the current implementation, when tables are dropped/deleted, they only get "soft" deleted; they get removed from the catalog and are no longer considered in queries, however, the data files related to those tables are not deleted.
We need to provide an option to the user to delete the actual data associated with those tables so that they don't continue to take up space.
General Requirements:
- User must have an option available to them to do a hard delete (CLI option). This can be as a flag/option of the existing delete table command or as a new command (former preferred).
- User must be warned when doing a hard delete.
- User confirmation must be received before proceeding with a hard delete ("This action cannot be reversed, are you sure you want to do this?).
- User should be informed that a hard table delete process has started, but there is no requirement to wait to return until after the delete is processed, unless we think we can make this operation take less than 60 seconds (assuming this is challenging if they have a lot of data).
- A Log entry should be written when the user starts this process.
- A process should run that begins the removal process and all associated files (or portions of files) for that table should be removed.
- A log entry should be written when the removal process is completed.
A separate ticket will be added for "Undelete" in the cases where a table is only "soft" deleted; Another ticket will be added for the automatic "hard delete" of a soft deleted table after a period of time has passed.
The hard delete should just set the option for when to hard delete like in #26262
@garylfowler can you clarify the details of what a "log entry" is?
Users will need some way to track the status and progress of a hard delete as it's something that could take some time. I think the cleanest way is to log the event to a table in the _internal database. We have that conceptually, but haven't actually started writing any data into tables there yet. I'm not sure we want to do logging into the _internal database yet, because we don't have retention periods wire up. We'll want _internal to have some period like 14 days or something. So I think hold off on the logging part of it for now, but what do you think @garylfowler?
The hard delete is actually just a process that runs automatically against databases and tables that have been soft deleted where their hard_delete_at time has come (detailed in #26262).
For the CLI to do hard delete, that's basically just doing a soft delete and explicitly setting the hard_delete_at time to now.
Yes, the requirement was around being able to track status. Writing to the _internal database makes sense and would provide a way for a user to lookup the status by seeing whether or not an event has been written. We could also (in future) build a command line option if we desired, to query and pull that info so they wouldn't have to remember what to query to get it.
Agree that we could hold off on the log writing part for now.
I would like to clarify one requirement:
If a user soft-deletes a table without specifying a hard_delete_at in the API request (or CLI command), what should the default hard deletion behaviour be?
- Never delete
- Wait for the default configured deletion grace period before deleting
I expect it to be wait for the default configured grace period (72h) before deleting.