neon icon indicating copy to clipboard operation
neon copied to clipboard

initdb.tar.zst is not deleted on timeline/tenant deletion

Open MMeent opened this issue 1 year ago • 3 comments

Steps to reproduce

  1. Create a new project, in e.g. eu-central-1
  2. Run show neon.tenant_id, mark the value down
  3. Delete the new project
  4. Look in the storage bucket of (this case) eu-central-1 for files from that tenant

Expected result

All data of that tenant has been deleted

Actual result

Most data of that tenant has been deleted, except the initdb.tar.zst

Environment

Production

Logs, links

  • One of my own test projects to verify this finding: https://s3.console.aws.amazon.com/s3/buckets/neon-prod-storage-eu-central-1?region=eu-central-1&prefix=pageserver%2Fv1%2Ftenants%2Fd521cfeccc29d52568c9efe512458389%2Ftimelines%2Ffaa6295a342850205f91b4e17e3679ef%2F&showversions=true

MMeent avatar Dec 21 '23 18:12 MMeent

I've replied on slack already, but putting it here as well.

This is expected behaviour. initdb.tar.zst is useful as long as WAL exists on S3, because as long as WAL is on S3, we might want to do use that WAL for DR. This means that we should not delete it unless the WAL is also deleted.

Also, the DR from WAL scenario involves deletion of the entire timeline, but it needs the initdb after that deletion concluded. Therefore, we either need an option to keep the initdb.tar.zst, or we integrate download of the initdb.tar.zst and a subsequent re-upload into the DR from WAL scenario.

I see some workable items:

  • (only once WAL gets deleted on timeline deletion) delete initdb.tar.zst by default during timeline deletion, but add an option to the delete endpoint to keep the initdb.tar.zst. as we have no way of persisting that option, we need to delete initdb.tar.zst very early in the process.
  • add an option to the scrubber to delete initdb.tar.zst as well. keep it optional because it might interfere with ongoing DR processes. run the scrubber with those options against the buckets.
  • Eventually, this is what Heikki suggested and I agree, we should run initdb on the computes and make the initdb.tar.zst part of the WAL. Then it's not the pageserver's job to keep those archives around.

arpad-m avatar Dec 21 '23 21:12 arpad-m

Setting this blocked - we'll update it when the safekeeper deletion is present: we should delete WAL and initdb at the same time.

jcsp avatar Jan 08 '24 11:01 jcsp

@arssher do you have a ticket for SK deletion in S3?

jcsp avatar Jan 08 '24 11:01 jcsp

Thinking about the entire "what to do with initdb deletion and the recovery scenario" more, I think that the best solution is to:

  • have deletion delete initdb.tar.zst unconditionally
  • add a separate endpoint to the pageserver that copies the initdb.tar.zst to a different location in S3, say same prefix but initdb-recovery.tar.zst
  • enhance the timeline creation code to look for first initdb.tar.zst and then initdb-recovery.tar.zst if former isn't found.

Reasoning:

The main alternative would be to add a parameter to deletion to "spare" the initdb.tar.zst file. However, there is zero parameters for deletion right now, and would have to pass the parameter through a lot of functions inside the pageserver. Also, we'd have to figure out a way to persist that flag to handle recoveries after a crash. Timeline deletion is also called by tenant deletion, so one would have to add the parameter to tenant deletion as well.

arpad-m avatar Jan 16 '24 21:01 arpad-m

add a separate endpoint to the pageserver that copies the initdb.tar.zst to a different location in S3, say same prefix but initdb-recovery.tar.zst

The reasoning makes sense to me. It's nice to have a separate endpoint only used in DR, so that ordinary users of the delete endpoint don't have to understand a flag.

jcsp avatar Jan 17 '24 12:01 jcsp

add a separate endpoint to the pageserver that copies the initdb.tar.zst to a different location in S3, say same prefix but initdb-recovery.tar.zst

I'd use a separate prefix instead, as it's easy to add expiration policies on prefixes, but not for postfixes. This also makes listing all these initdb-recovery files much cheaper, and makes it easy to check that no data that should've been deleted is left behind (main tenant data directory is empty).

MMeent avatar Jan 17 '24 13:01 MMeent