neon
neon copied to clipboard
RFC on how to perform tenant/timeline operations atomically.
It's currently not always clear what happens if the pageserver crashes in the middle of an operation like branch or tenant creation. What is the state on local disk, and what is the state on S3? If they don't match, which one wins?
This RFC tries to clarify that. It introduces a new concept, the
tenant.json
file.
This RFC tries to address the race conditions discussed at https://github.com/neondatabase/neon/pull/2231
Regular general concerns I have with the index files approach are backward compatibility and consistency.
We have
TimelineMetadataHeader
for such things inmetadata
already: it has a checksum and a version number. Since we're presumably changingmetadata
structure in this RFC, we would at least need to cover transition between versions? Ideally, it would be great to consider similar things for both *.json files on S3 here too.
Yeah, introducing a new file requires handling backwards-compatibility, unfortunately. This wouldn't require any changes to the current metadata files, though, it just adds a new tenant.json
file. We would need to have an upgrade process. I think it would be a three-step process:
- Modify the pageserver so that it creates and updates the
tenant.json
file on every tenant/timeline operation, but doesn't read it. Upgrade production to that version - Run a script that creates a
tenant.json
file for every tenant in S3 that's still missing it. - Every tenant now has a
tenant.json
file. We can now do the rest of the pageserver changes described in the RFC, and start relying on thetenant.json
file.
That's doable, but I'm having second thoughts on this proposal now, because of that :-(. Maybe we should just live with some ambiguity on which timelines exist for now.
his wouldn't require any changes to the current metadata files, though, it just adds a new tenant.json file.
I would actually like to makes some changes to the current timeline index_part.json
file too. It's a bit silly that it stores the metadata as a binary blob inside the JSON. It would be nicer if it serialized the metadata into JSON too, making the whole file human-readable. But that's an independent change, not required by this RFC.
Also I think it would be good to use our aversion crate for such kind of metadata files. It solves problem with format changes and compatibility
https://github.com/neondatabase/neon/pull/2426 and https://github.com/neondatabase/neon/pull/2489 implemented atomic creation of timeline and tenant files (including their directories). Tenants are created through the tmp directory as proposed here, timelines are currently created using a special, uninit file, that allows us to be sure on every restart after crash we detect partially created directories and clean then up. In regular work, on errors, such temporary directories are cleaned up automatically now, allowing to retry the creation later.
Non-tmp dir approach for timelines was caused by more big and complex code changes needed for that to work: https://github.com/neondatabase/neon/pull/2489#discussion_r998570376 I think it's good to be done eventually.
I think, we could close this RFC and reopen it reformulated, considering the new changes?
As discussed moving to draft.