neon
neon copied to clipboard
Logical replication subscriptions are copied into a new branch
Problem
We drop replication slots at branch creation time. That's relatively easy, because they are stored as files in Postgres and AUX storage in pageserver. Yet, pg_subscription is a normal catalog table, so it's subjected to copy-on-write, and all subscriptions are handed over to the new branch
This opens room for races and data inconsistency, as two subscribers will race for the same slot.
Also, if you just drop the subscription in the new branch, that will drop the slot on the publisher, so replication in the main branch will stop working. There is a workaround to do in the new branch
alter subscription pgbench disable;
alter subscription pgbench set (slot_name = NONE);
drop subscription pgbench;
But race is still there
More context here: https://neondb.slack.com/archives/C06M2T69TSP/p1724069353709079
Solution
It's hard to tell what's the desired behavior here. Probably, from the UX PoV it's better to just drop or at least disable all subscriptions at branching time.
One tricky part here is that if we do this at compute startup time, we cannot do this immediately, so there is still a race. A new branch can kinda kick off the main branch subscriber and advance the slot, so the main branch will miss some changes when it's back again
Not sure, maybe we can start Postgres in the single-user mode or set max_logical_replication_workers to 0? Then disable subscriptions and start normally. Any other thoughts?
dup w/ https://github.com/neondatabase/neon/issues/8757, I will close the previous one
Ups, sorry
@ololobus i suggest we elevate this to the Epic level
also want to confirm if we have the same POV:
- this issue means broken experience when creating a child of a branch that has running inbound on it
do we agree here?
perhaps we should even forbid branching from a branch that has incoming LR?
what's our path forward? do we have an idea about how to implement drop of the slots when branching? how much effort is that?
this issue means broken experience when creating a child of a branch that has running inbound on it do we agree here?
Not completely broken, as there is a workaround, but yes, it's a pretty bad UX, so I agree
perhaps we should even forbid branching from a branch that has incoming LR?
That's tricky to implement, as at the branching time there could be no compute running to check that. Also, when we branch from the point in time -- we do not know whether there are some active subscribers there. This means that we always need to start a static RO on parent just to check this condition. Caching this in cplane is not viable as well because of time travel
what's our path forward? do we have an idea about how to implement drop of the slots when branching? how much effort is that?
I put some thoughts into the solution section. I haven't tried anything yet, though. Overall, it will likely be a joint cplane + compute effort
@hlinnaka @knizhnik @jcsp can we fix this on the pageserver side? In a similar way, we did that for logical replication slots, i.e., drop all subscriptions at branch creation time. Is it viable at all?
That would be great because it's completely race-free, so when we start a compute, we already know than none of subscriptions will activate.
I understand it's a different case -- aux files vs. normal Postgres relation. Yet, iiuc, pg_subscription has a fixed oid, so it should be possible to filter out specific keys. One thing I'd immediately worry about is that there could be some implicit dependencies to other relations, so if we drop all pages of pg_subscription (+ indexes, + toast?) there could be something left, which may confuse the Postgres and will look like data corruption.
Dropping a whole OID is straightforward. The logic in get_vectored_reconstruct_data can be extended where we do this:
// Do not descend into the ancestor timeline for aux files.
// We don't return a blanket [`GetVectoredError::MissingKey`] to avoid
// stalling compaction.
keyspace.remove_overlapping_with(&KeySpace {
ranges: vec![NON_INHERITED_RANGE, NON_INHERITED_SPARSE_RANGE],
});
Ideally we would start with a test case that fails today and then passes with the change.
I'm also not sure how postgres will react to this, if it has metadata that expects pg_subscription to exist and then gets zeros when reading it.
We might need some extra logic to deal with it, I suspect the key range for inbound LR is in the normal relational data range. Anyways, I'd like someone to figure out the key range (oid?) for the data, and I can do some experiments.
@hlinnaka @knizhnik @jcsp can we fix this on the pageserver side? In a similar way, we did that for logical replication slots, i.e., drop all subscriptions at branch creation time. Is it viable at all?
I do no think that it is good solution.
First of all small technical problem: pg_subscription has two indexes:
"pg_subscription_oid_index" PRIMARY KEY, btree (oid), tablespace "pg_global"
"pg_subscription_subname_index" UNIQUE CONSTRAINT, btree (subdbid, subname), tablespace "pg_global"
which should be excluded from lookup also. Please also notice that it still should be possible to locate entry with relation size.It seems to be non-trivial: we should not just prevent lookup in parent timeline, but we should return valid zero length entry! It seems to be very dirty hack.
But what seems to be even more important, I still do not understand what is the "right" behaviour fro the user's point of view. It seems too be ok if you create developer's branch and do mot receive LR updates. But branching can be also used as recovery mechanism, as some replacement of backups. Consider that someone is creating new branch once per day, assuming that if something is going wrong, he can "restore" database state with this branch. Most likely that in this case user expect that all subscriptions active at the moment of branch creating are preserved. The only problem is that will not work - because slot on publisher already move forward and there is no way to rewind it back.
I thinking more about two possible solutions:
- Prohibit branching if there are active subscriptions (it doesn't work for creating branch in the past).
- Somehow make slots unique to timeline (hook
create replicationstatement and provide explicitly created slot for it). But I failed to find way to avoid duplicates or lost updates in this schema.
May be I am missing something, but there seems to be there principle problem here: LR is not able actually to sync data at subscriber and publisher. What is called "initial data sync" is actually "initial copy". So if there is already some data in subscriber's database (and it is actually present when we create a branch), then attempt to establish LR will failed with duplicate error. In theory the problem can be solved by avoiding copy data and specifying start_lsn at the moment of branch creation. But it seems to be also no possible, because it is not possible to start replication at the arbitrary LSN. So looks like there is no way to continue LR in branch. You can only truncate the table and start LR from scratch.
There are three possible scenario after branch creation:
- LR is active only in parent
- LR inactive only in child
- LR is active in both of them And with any poroposed solution we should be able to address all this 3 cases. So assume we somehow managed to disable LR on branch creation on PS side. Then we still have to propose a way how to resume LR in child.
Another interesting case https://neondb.slack.com/archives/C03QLRH7PPD/p1726488093540349
Looks like Postgres can create some aux slots like pg_27482_sync_25719_7413721407049075071, or is it some RDS specifics?
Prohibit branching if there are active subscriptions (it doesn't work for creating branch in the past).
Even keeping aside that this would require tricky synchronization between the compute and control plane, I don't think it's a feasible approach.
One of the workflows we recommend to our customers is
- Keep your main production DB in some other service (if customer cannot migrate to Neon right now for whatever reason, or just want to give Neon a try for a longer period)
- Setup LR from their DBaaS to Neon project
- Use Neon project for workflows, testing, etc. -> this means branching
You can see a dogfooding example of how I set it up inside Neon staging here https://neondb.slack.com/archives/C079RBJ28F6/p1726224971805179