neon icon indicating copy to clipboard operation
neon copied to clipboard

Logical replication subscriptions are copied into a new branch

Open ololobus opened this issue 1 year ago • 4 comments
trafficstars

Problem

We drop replication slots at branch creation time. That's relatively easy, because they are stored as files in Postgres and AUX storage in pageserver. Yet, pg_subscription is a normal catalog table, so it's subjected to copy-on-write, and all subscriptions are handed over to the new branch

This opens room for races and data inconsistency, as two subscribers will race for the same slot.

Also, if you just drop the subscription in the new branch, that will drop the slot on the publisher, so replication in the main branch will stop working. There is a workaround to do in the new branch

alter subscription pgbench disable;
alter subscription pgbench set (slot_name = NONE);
drop subscription pgbench;

But race is still there

More context here: https://neondb.slack.com/archives/C06M2T69TSP/p1724069353709079

Solution

It's hard to tell what's the desired behavior here. Probably, from the UX PoV it's better to just drop or at least disable all subscriptions at branching time.

One tricky part here is that if we do this at compute startup time, we cannot do this immediately, so there is still a race. A new branch can kinda kick off the main branch subscriber and advance the slot, so the main branch will miss some changes when it's back again

Not sure, maybe we can start Postgres in the single-user mode or set max_logical_replication_workers to 0? Then disable subscriptions and start normally. Any other thoughts?

ololobus avatar Aug 21 '24 15:08 ololobus

dup w/ https://github.com/neondatabase/neon/issues/8757, I will close the previous one

skyzh avatar Aug 21 '24 19:08 skyzh

Ups, sorry

ololobus avatar Aug 22 '24 10:08 ololobus

@ololobus i suggest we elevate this to the Epic level

also want to confirm if we have the same POV:

  • this issue means broken experience when creating a child of a branch that has running inbound on it

do we agree here?

perhaps we should even forbid branching from a branch that has incoming LR?

what's our path forward? do we have an idea about how to implement drop of the slots when branching? how much effort is that?

stepashka avatar Aug 28 '24 13:08 stepashka

this issue means broken experience when creating a child of a branch that has running inbound on it do we agree here?

Not completely broken, as there is a workaround, but yes, it's a pretty bad UX, so I agree

perhaps we should even forbid branching from a branch that has incoming LR?

That's tricky to implement, as at the branching time there could be no compute running to check that. Also, when we branch from the point in time -- we do not know whether there are some active subscribers there. This means that we always need to start a static RO on parent just to check this condition. Caching this in cplane is not viable as well because of time travel

what's our path forward? do we have an idea about how to implement drop of the slots when branching? how much effort is that?

I put some thoughts into the solution section. I haven't tried anything yet, though. Overall, it will likely be a joint cplane + compute effort

ololobus avatar Aug 28 '24 15:08 ololobus

@hlinnaka @knizhnik @jcsp can we fix this on the pageserver side? In a similar way, we did that for logical replication slots, i.e., drop all subscriptions at branch creation time. Is it viable at all?

That would be great because it's completely race-free, so when we start a compute, we already know than none of subscriptions will activate.

I understand it's a different case -- aux files vs. normal Postgres relation. Yet, iiuc, pg_subscription has a fixed oid, so it should be possible to filter out specific keys. One thing I'd immediately worry about is that there could be some implicit dependencies to other relations, so if we drop all pages of pg_subscription (+ indexes, + toast?) there could be something left, which may confuse the Postgres and will look like data corruption.

ololobus avatar Sep 06 '24 12:09 ololobus

Dropping a whole OID is straightforward. The logic in get_vectored_reconstruct_data can be extended where we do this:

            // Do not descend into the ancestor timeline for aux files.
            // We don't return a blanket [`GetVectoredError::MissingKey`] to avoid
            // stalling compaction.
            keyspace.remove_overlapping_with(&KeySpace {
                ranges: vec![NON_INHERITED_RANGE, NON_INHERITED_SPARSE_RANGE],
            });

Ideally we would start with a test case that fails today and then passes with the change.

I'm also not sure how postgres will react to this, if it has metadata that expects pg_subscription to exist and then gets zeros when reading it.

jcsp avatar Sep 06 '24 13:09 jcsp

We might need some extra logic to deal with it, I suspect the key range for inbound LR is in the normal relational data range. Anyways, I'd like someone to figure out the key range (oid?) for the data, and I can do some experiments.

skyzh avatar Sep 06 '24 15:09 skyzh

@hlinnaka @knizhnik @jcsp can we fix this on the pageserver side? In a similar way, we did that for logical replication slots, i.e., drop all subscriptions at branch creation time. Is it viable at all?

I do no think that it is good solution. First of all small technical problem: pg_subscription has two indexes:

    "pg_subscription_oid_index" PRIMARY KEY, btree (oid), tablespace "pg_global"
    "pg_subscription_subname_index" UNIQUE CONSTRAINT, btree (subdbid, subname), tablespace "pg_global"

which should be excluded from lookup also. Please also notice that it still should be possible to locate entry with relation size.It seems to be non-trivial: we should not just prevent lookup in parent timeline, but we should return valid zero length entry! It seems to be very dirty hack.

But what seems to be even more important, I still do not understand what is the "right" behaviour fro the user's point of view. It seems too be ok if you create developer's branch and do mot receive LR updates. But branching can be also used as recovery mechanism, as some replacement of backups. Consider that someone is creating new branch once per day, assuming that if something is going wrong, he can "restore" database state with this branch. Most likely that in this case user expect that all subscriptions active at the moment of branch creating are preserved. The only problem is that will not work - because slot on publisher already move forward and there is no way to rewind it back.

I thinking more about two possible solutions:

  1. Prohibit branching if there are active subscriptions (it doesn't work for creating branch in the past).
  2. Somehow make slots unique to timeline (hook create replication statement and provide explicitly created slot for it). But I failed to find way to avoid duplicates or lost updates in this schema.

May be I am missing something, but there seems to be there principle problem here: LR is not able actually to sync data at subscriber and publisher. What is called "initial data sync" is actually "initial copy". So if there is already some data in subscriber's database (and it is actually present when we create a branch), then attempt to establish LR will failed with duplicate error. In theory the problem can be solved by avoiding copy data and specifying start_lsn at the moment of branch creation. But it seems to be also no possible, because it is not possible to start replication at the arbitrary LSN. So looks like there is no way to continue LR in branch. You can only truncate the table and start LR from scratch.

knizhnik avatar Sep 07 '24 06:09 knizhnik

There are three possible scenario after branch creation:

  1. LR is active only in parent
  2. LR inactive only in child
  3. LR is active in both of them And with any poroposed solution we should be able to address all this 3 cases. So assume we somehow managed to disable LR on branch creation on PS side. Then we still have to propose a way how to resume LR in child.

knizhnik avatar Sep 07 '24 06:09 knizhnik

Another interesting case https://neondb.slack.com/archives/C03QLRH7PPD/p1726488093540349

Looks like Postgres can create some aux slots like pg_27482_sync_25719_7413721407049075071, or is it some RDS specifics?

ololobus avatar Sep 16 '24 12:09 ololobus

Prohibit branching if there are active subscriptions (it doesn't work for creating branch in the past).

Even keeping aside that this would require tricky synchronization between the compute and control plane, I don't think it's a feasible approach.

One of the workflows we recommend to our customers is

  1. Keep your main production DB in some other service (if customer cannot migrate to Neon right now for whatever reason, or just want to give Neon a try for a longer period)
  2. Setup LR from their DBaaS to Neon project
  3. Use Neon project for workflows, testing, etc. -> this means branching

You can see a dogfooding example of how I set it up inside Neon staging here https://neondb.slack.com/archives/C079RBJ28F6/p1726224971805179

ololobus avatar Sep 16 '24 12:09 ololobus