neon docs/rfcs: timeline ancestor detach API

Problem

When a tenant creates a new timeline that they will treat as their 'main' history, it is awkward to permanently retain an 'old main' timeline as its ancestor. Currently this is necessary because it is forbidden to delete a timeline which has descendents.

Summary of changes

A new pageserver API is proposed to 'adopt' data from a parent timeline into one of its children, such that the link between ancestor and child can be severed, leaving the parent in a state where it may then be deleted.

Feb 23 '24 10:02 jcsp

3139 tests run: 3018 passed, 0 failed, 121 skipped (full report)

Flaky tests (3)

Postgres 16

test_pg_regress[4]: debug
test_sharding_split_compaction[None]: debug

Postgres 14

test_isolation[4]: debug

Code coverage* (full report)

functions: 32.7% (6985 of 21380 functions)
lines: 50.0% (55020 of 109932 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
72e9a89b49cf8253be38647666fc7da60b3b9e78 at 2024-07-17T14:33:28.520Z :recycle:}

Feb 23 '24 10:02 github-actions[bot]

added @ololobus and @prepor here to reviewers as representatives of expected consumers of the new API

Mar 04 '24 12:03 stepashka

Maybe the RFC could have a section about interaction with WAL DR in the interaction with other features section?

WAL DR currently only supports the main branch but with the data that the safekeepers have, it can be extended to support DR also for tenants with branches.

If we delete entire timelines however, then we also delete the information whose timeline's WAL to use or copy. And if the old root timeline is being deleted, then one also deletes the initdb archive. Both would be required for building a WAL DR feature supporting merged branches in the future.

One way would be to keep a record file of historic timeline IDs so that we can ask safekeepers for WAL from those timelines. For the initdb we could copy it over, but of course the initdb won't cleanly apply with the WAL of the merged root timeline, one would have to apply it together with the original timeline's WAL.

Still, the merge needs to happen after we have implemented retaining both types of information in order for us to be able to WAL DR, regardless of when that WAL DR for branching feature is implemented.

In other words, we need to decide whether to implement retaining of this information, or whether it's okay to have a bunch of timelines that we really won't be able to WAL DR.

Mar 04 '24 13:03 arpad-m

I think, from Neon Cloud perspective the definition/motivation is a little bit different. Let me try to formulate what we need as I see it, probably it'll change the final solution.

Let's consider the restore use case:

initial state

--- main branch (timeline 1) --->

we are restoring branch to the point a

-a- old branch (timeline 1) --->
 \
  - main branch (timeline 2) ->

after checking that all is fine user would want to delete the old branch because it isn't needed anymore. But he/she can't because it's now the branch with children.
With each and every new restoration the hierarchy will grow, and the number of layers will increase.

Another case with branch reset to parent:

initial state

-a- main branch (timeline 1) --->
 \
  - child branch (timeline 2) -b->
                                \
                                 - grandchild branch (timeline 3) -->

resetting of child branch leads to:

-a- main branch (timeline 1) --->
 \                               \
  \                               - child branch (timeline 4) -->
   \- old branch (timeline 2) -b->
                               \
                                - grandchild branch (timeline 3) -->

And again, we can't delete "old branch" because "grandchild" is pointing to it.

From UX perspective we can just hide such branches from the user after "deleting" them. But instead of actually deleting we can mark them as "abandoned" and exclude from UI.

But such branches will continue to exist on pageserver level and eat resources. That's how I see the definition of the problem. Instead of "merging" timelines, we need to implement a new type of Garbage Collecting.

API (draft) that I'm proposing is consisting of two parts:

GET /abandoned The list of abandoned timelines. Pageserver pulls this list and merges timelines, deletes non needed parts or even whole timeline.
POST /optimized After finishing, pageserver notifies Control Plane about results: which timelines were merged to which and which were completely deleted. Control Plane modifies it's hierarchy accordingly.

We also can implement "normal" timeline/tenant deletion via this way.

What I like about this API:

pull API: pageserver decides when it has resources for merging and controls rate
it can execute operations in batches, grouped by tenant, for example
we do not block any other operations for user: deleting / merging is a background process.

Mar 04 '24 17:03 prepor

API (draft) that I'm proposing is consisting of two parts: GET /abandoned

The "abandoned" timeline can have many children, so the pageserver needs to know which one to merge into. Look at the "Before" example in the RFC to understand why -- in that case, simply indicating that the "old main" is abandoned would result in ambiguity.

The point about push vs. pull APIs is something that perhaps can be handled at the control plane layer if it is important for the console to avoid having to call out to the pageserver. For the pageserver to continuously poll the control plane for work doesn't make sense to me: the control plane can internally poll its database much more efficiently.

Mar 05 '24 08:03 jcsp

The "abandoned" timeline can have many children, so the pageserver needs to know which one to merge into.

That's the whole point of my proposal. Instead of "command" pageserver what to do, control plane could just notify pageserver that it doesn't need this timeline anymore, so pageserver can make some optimizations if it wants/is able. Pageserver has all the information about the timeline's children and their branching points.

the control plane can internally poll its database much more efficiently.

polling database is cheap, merging / deleting resources is expensive. so we can leave the decision on when to do it to pageserver. Even if the logic of this decision initially could be very simple, we can later optimize it and even put it, for example, into a separate asynchronous process. And all of it without changing API and without blocking any other operations for the user.

Mar 05 '24 11:03 prepor

Notes from chat:

The Console concept is to "abandon" a timeline: this means that a user-facing Branch no longer refers to the timeline.
In the RFC example (child a, child b etc), marking "old main" as abandoned would only cause the Timeline to be truncated to the LSN where child B branches off.
In the same example, if child B was marked abandoned, then one could proceed to do a merge of new main into old main, and re-parent child A to new main.
So there are two primitive operations: merging timeline into a parent timeline (as already described in the RFC), and truncating a timeline. Some extra logic (either in PS or CP) would analyze the graph of Timelines (and whether they are abandoned), and figure out which Timelines need to be merged or truncated.
A still-existing Branch's Timeline might have an abandoned Timeline as its ancestor, in which case requests to create new Branches from before its Timeline's branch point would need to be directed to the abandoned parent. The take from the storage side is that this ancestor-recursing logic belongs in control plane, and that we can diverge the semantics of branches and timelines, such that:
- For a Timeline, one may only create a child from an LSN within that timeline (i.e. not before its branch point)
- For a Branch, one may create a child from any LSN in its history, and the actual Timeline ancestor of that created branch may be different than the Timeline of the Branch ancestor.
The question of push vs. pull APIs is something perhaps for the CP team, as they would either way own the mapping of abandon UPDATEs into Operations, or providing an API for the pageserver to poll. The inclination in storage is to stick with an imperative API to match how timeline creation/deletion is done.

Mar 06 '24 15:03 jcsp

I went through the RFC and discussion and like Andrey's examples, they seem to be more real-life at this point. Not fully share the alternative API proposal, though, as I don't really like this bi-directional reconciliation kind of stuff

The Console concept is to "abandon" a timeline: this means that a user-facing Branch no longer refers to the timeline

It's not an existing concept and I'm not 100% sure we need it

Right now, in reset API we have an option for keeping or deleting the previous branch. Due to the limitations this RFC is intended to resolve, we just forbid reset without keeping the old branch if it has ancestors. Like the Andrey's example

-a- main branch (timeline 1) --->
 \
  - child branch (timeline 2) -b->
                          \
                           - grandchild branch (timeline 3) -->

so we do this instead

-a- main branch (timeline 1) --->
 \                               \
  \                               - child branch (timeline 4) -->
   \
    - old branch (timeline 2) -b->
                          \
                           - grandchild branch (timeline 3) -->

What do we expect after implementing this RFC?

My understanding is that we again have two options:

User decided to keep previous branch -> OK, we do it as before
User decided to delete it right away -> then, I think we just use new API to merge timeline 2 into timeline 3 and end up in a state

-a- main branch (timeline 1) --->
 \                               \
  \                               - child branch (timeline 4) -->
   \
    - grandchild branch (timeline 3) -->

when timeline 3 is a direct child of timeline 1, right?

I think case 1. with user deciding to delete old branch after that falls into the same category as 2.

In 2., what if old branch has a full-blown hierarchy of children? IIUC, cplane behavior will be to just figure out all direct children (top common ancestors) of old branch aka timeline 2 and merge them with timeline 1. Will pageserver be happy with such a bulk-merging of 10, 20, 100 branches? If not, what are we going to do instead?

At first glance I don't really like this

A still-existing Branch's Timeline might have an abandoned Timeline as its ancestor, in which case requests to create new Branches from before its Timeline's branch point would need to be directed to the abandoned parent.

because it creates two trees: user-visible and a real shadow one, which will make it harder to explain to user what occupies the space. And with solution above we won't need to keep abandoned timelines at all

Mar 06 '24 19:03 ololobus

A still-existing Branch's Timeline might have an abandoned Timeline as its ancestor, in which case requests to create new Branches from before its Timeline's branch point would need to be directed to the abandoned parent.

If we want to keep such hidden timelines, then we don't really need the proposed merge API, I suppose. Instead, we just need to trim old branch aka timeline 2 to the LSN == max(branching LSN of all existing children, timeline 3 in this case), so that it doesn't occupy any extra space, only needed for copy-on-write to work

Mar 06 '24 19:03 ololobus

If we want to keep such hidden timelines, then we don't really need the proposed merge API, I suppose.

We still need it: if someone uses the reset API repeatedly we must avoid a situation where they end up with a deeply nested tree of branches.

Keeping old timeline is only useful if we want to keep child branches that have a branch point after the LSN we reset to, and avoid the storage cost of having two totally separate histories (the "totally separate histories" was part of the original RFC proposal, on the basis that this would be a transient state and user would soon delete their old branches).

I think we have a choice between: A) a more general solution (abandoned branches), where a user can arbitrarily select which branches are unwanted, and have our system do the merging/truncating as needed. B) and a more special case solution (promoting a specific branch to "new main", which makes it standalone and makes it take ownership of any children before its branch point).

The branch-merging (aka "ancestor branch deletion", since merging is an overloaded term) is common work between them: that's the thing I think it makes sense to proceed with building, while the discussion about how higher layers will expose these concepts to users continue.

Mar 07 '24 14:03 jcsp

Failures on Postgres 14

test_ancestor_detach_branched_from[True-False-earlier]: debug

Funnily enough, this is another instance of #7830, being fixed already.

Jul 09 '24 14:07 koivunej

neon neon copied to clipboard

docs/rfcs: timeline ancestor detach API

Problem

Summary of changes

3139 tests run: 3018 passed, 0 failed, 121 skipped (full report)

Postgres 16

Postgres 14

Code coverage* (full report)

Failures on Postgres 14

neon
neon copied to clipboard