neon
neon copied to clipboard
initdb.tar.zst is not deleted on timeline/tenant deletion
Steps to reproduce
- Create a new project, in e.g. eu-central-1
- Run
show neon.tenant_id, mark the value down - Delete the new project
- Look in the storage bucket of (this case) eu-central-1 for files from that tenant
Expected result
All data of that tenant has been deleted
Actual result
Most data of that tenant has been deleted, except the initdb.tar.zst
Environment
Production
Logs, links
- One of my own test projects to verify this finding: https://s3.console.aws.amazon.com/s3/buckets/neon-prod-storage-eu-central-1?region=eu-central-1&prefix=pageserver%2Fv1%2Ftenants%2Fd521cfeccc29d52568c9efe512458389%2Ftimelines%2Ffaa6295a342850205f91b4e17e3679ef%2F&showversions=true
I've replied on slack already, but putting it here as well.
This is expected behaviour. initdb.tar.zst is useful as long as WAL exists on S3, because as long as WAL is on S3, we might want to do use that WAL for DR. This means that we should not delete it unless the WAL is also deleted.
Also, the DR from WAL scenario involves deletion of the entire timeline, but it needs the initdb after that deletion concluded. Therefore, we either need an option to keep the initdb.tar.zst, or we integrate download of the initdb.tar.zst and a subsequent re-upload into the DR from WAL scenario.
I see some workable items:
- (only once WAL gets deleted on timeline deletion) delete
initdb.tar.zstby default during timeline deletion, but add an option to the delete endpoint to keep theinitdb.tar.zst. as we have no way of persisting that option, we need to deleteinitdb.tar.zstvery early in the process. - add an option to the scrubber to delete
initdb.tar.zstas well. keep it optional because it might interfere with ongoing DR processes. run the scrubber with those options against the buckets. - Eventually, this is what Heikki suggested and I agree, we should run initdb on the computes and make the
initdb.tar.zstpart of the WAL. Then it's not the pageserver's job to keep those archives around.
Setting this blocked - we'll update it when the safekeeper deletion is present: we should delete WAL and initdb at the same time.
@arssher do you have a ticket for SK deletion in S3?
Thinking about the entire "what to do with initdb deletion and the recovery scenario" more, I think that the best solution is to:
- have deletion delete
initdb.tar.zstunconditionally - add a separate endpoint to the pageserver that copies the
initdb.tar.zstto a different location in S3, say same prefix butinitdb-recovery.tar.zst - enhance the timeline creation code to look for first
initdb.tar.zstand theninitdb-recovery.tar.zstif former isn't found.
Reasoning:
The main alternative would be to add a parameter to deletion to "spare" the initdb.tar.zst file. However, there is zero parameters for deletion right now, and would have to pass the parameter through a lot of functions inside the pageserver. Also, we'd have to figure out a way to persist that flag to handle recoveries after a crash. Timeline deletion is also called by tenant deletion, so one would have to add the parameter to tenant deletion as well.
add a separate endpoint to the pageserver that copies the initdb.tar.zst to a different location in S3, say same prefix but initdb-recovery.tar.zst
The reasoning makes sense to me. It's nice to have a separate endpoint only used in DR, so that ordinary users of the delete endpoint don't have to understand a flag.
add a separate endpoint to the pageserver that copies the initdb.tar.zst to a different location in S3, say same prefix but initdb-recovery.tar.zst
I'd use a separate prefix instead, as it's easy to add expiration policies on prefixes, but not for postfixes. This also makes listing all these initdb-recovery files much cheaper, and makes it easy to check that no data that should've been deleted is left behind (main tenant data directory is empty).