neon icon indicating copy to clipboard operation
neon copied to clipboard

RFC on how to perform tenant/timeline operations atomically.

Open hlinnaka opened this issue 1 year ago • 4 comments

It's currently not always clear what happens if the pageserver crashes in the middle of an operation like branch or tenant creation. What is the state on local disk, and what is the state on S3? If they don't match, which one wins?

This RFC tries to clarify that. It introduces a new concept, the tenant.json file.

hlinnaka avatar Aug 09 '22 19:08 hlinnaka

This RFC tries to address the race conditions discussed at https://github.com/neondatabase/neon/pull/2231

hlinnaka avatar Aug 09 '22 19:08 hlinnaka

Regular general concerns I have with the index files approach are backward compatibility and consistency.

We have TimelineMetadataHeader for such things in metadata already: it has a checksum and a version number. Since we're presumably changing metadata structure in this RFC, we would at least need to cover transition between versions? Ideally, it would be great to consider similar things for both *.json files on S3 here too.

Yeah, introducing a new file requires handling backwards-compatibility, unfortunately. This wouldn't require any changes to the current metadata files, though, it just adds a new tenant.json file. We would need to have an upgrade process. I think it would be a three-step process:

  1. Modify the pageserver so that it creates and updates the tenant.json file on every tenant/timeline operation, but doesn't read it. Upgrade production to that version
  2. Run a script that creates a tenant.json file for every tenant in S3 that's still missing it.
  3. Every tenant now has a tenant.json file. We can now do the rest of the pageserver changes described in the RFC, and start relying on the tenant.json file.

That's doable, but I'm having second thoughts on this proposal now, because of that :-(. Maybe we should just live with some ambiguity on which timelines exist for now.

hlinnaka avatar Aug 11 '22 08:08 hlinnaka

his wouldn't require any changes to the current metadata files, though, it just adds a new tenant.json file.

I would actually like to makes some changes to the current timeline index_part.json file too. It's a bit silly that it stores the metadata as a binary blob inside the JSON. It would be nicer if it serialized the metadata into JSON too, making the whole file human-readable. But that's an independent change, not required by this RFC.

hlinnaka avatar Aug 11 '22 08:08 hlinnaka

Also I think it would be good to use our aversion crate for such kind of metadata files. It solves problem with format changes and compatibility

LizardWizzard avatar Aug 11 '22 18:08 LizardWizzard

https://github.com/neondatabase/neon/pull/2426 and https://github.com/neondatabase/neon/pull/2489 implemented atomic creation of timeline and tenant files (including their directories). Tenants are created through the tmp directory as proposed here, timelines are currently created using a special, uninit file, that allows us to be sure on every restart after crash we detect partially created directories and clean then up. In regular work, on errors, such temporary directories are cleaned up automatically now, allowing to retry the creation later.

Non-tmp dir approach for timelines was caused by more big and complex code changes needed for that to work: https://github.com/neondatabase/neon/pull/2489#discussion_r998570376 I think it's good to be done eventually.

I think, we could close this RFC and reopen it reformulated, considering the new changes?

SomeoneToIgnore avatar Oct 20 '22 09:10 SomeoneToIgnore

As discussed moving to draft.

LizardWizzard avatar Mar 21 '23 15:03 LizardWizzard