neon icon indicating copy to clipboard operation
neon copied to clipboard

logical replication prevent compute from scaling down to zero

Open knizhnik opened this issue 1 year ago • 18 comments

Steps to reproduce

We are now suspending computes with active connections if there were no requests during some time. So if we have idle master with logical replication subscription, then it will be really suspended after some time. But then (it depends on particular susbscriber), most likely this subscriber detect connection failure and tries to reestablish it. At least it will have with standard postgres-to-postgres logical subscription. As a result master will be immediately reactivated. So we nevr scale to zero and at the same time do stupid job of restarting compute. The only possible solution I see at this moment is to keep open connection at replica and establish connection with compute on demand. It can be useful not only for logical replication. It will help to permanent connections with Neon, even when compute is suspended. Certainly there should be large number of problems which have to be solved:L How to prevent overflow of available ports space How to store authentication information needed to reestablish connection with compute In case of logical replication connection has to be reactivated from another end (from compute to proxy)

Expected result

Compute is suspended where there is no activity for some time.

Actual result

Compute is never scale to zero. Hypothesis: is is reactive by logical replication subscriber when it detects connection failure with the suspended compute

knizhnik avatar Jan 02 '24 19:01 knizhnik

a recent conversation in https://neondb.slack.com/archives/C06B8AKJVD3/p1704280788306989?thread_ts=1704280688.731699&cid=C06B8AKJVD3

stepashka avatar Jan 05 '24 11:01 stepashka

Done by @arssher https://github.com/neondatabase/neon/commit/a41c4122e39ab151f709262329576f9a9b5a2db7

vadim2404 avatar Jan 12 '24 09:01 vadim2404

Just want to notice that the original problem is not solved. @ars fix stupid behaviour causing constant reactivation of suspended publisher by prohibiting its suspension! But the original concern (compute is not scaled down to zero if logical replication subscription exists) was not addressed.

As I mentioned in Slack, there are two possible solutions of the problem:

  1. Maintain live connection from subscriber to proxy and establish publisher<->proxy connection when publisher is activate)
  2. Prohibit replication connections if subscriber is suspended

Both approaches have to be implemented at proxy.

knizhnik avatar Jan 12 '24 09:01 knizhnik

There are two issues that sound similarly #6258 and #6257

Which should we keep?

vadim2404 avatar Jan 12 '24 09:01 vadim2404

This one

knizhnik avatar Jan 12 '24 09:01 knizhnik

@neondatabase/proxy WDYT?

vadim2404 avatar Jan 12 '24 09:01 vadim2404

It's the first I've heard of this issue. Will need to research the problem/proposals.

conradludgate avatar Jan 12 '24 10:01 conradludgate

I think doing 1 is fine. I suspect it's time to start sniffing the packets as we send them. As I understand it, all we need to detect is a simple query with the text START_REPLICATION SLOT _ LOGICAL, at that point, proxy hijacks the connection and does the magic.

While there is a publisher active, proxy initiates it's own logical replication connection to the publisher and passes on all bytes to the subscriber. When the publisher detaches, proxy holds the connection open with the subscriber and asks CPlane for a notification when the publisher is back online. At that point, proxy recreates the logrep connection and continues passing data to the subscriber.

Regarding port exhaustion, this is already a concern for all proxy connections 😅 At some point we need to implement idle connection GC, proxy scaling and NLB availability checks so we can actually do something about that.

conradludgate avatar Jan 12 '24 15:01 conradludgate

  1. Prohibit replication connections if subscriber is suspended

I don't think this is possible without some trickery or latency cost

  1. We can only know the connection is a replication connection once the query comes in
  2. The client never sends the query until compute/proxy responds with the database parameters.

proxy does not yet know exactly what connection parameters a database will respond with, so proxy must wake_compute before hand. We would need some kind of way for cplane to know the params ahead of time, and for cplane to send them to proxy without adding additional latency.

conradludgate avatar Jan 12 '24 16:01 conradludgate

I think maybe it's better to have a dedicated endpoint/proxy instance for replication so we can ignore startup latency for those connections. This would require disallowing logrep on regular proxy (which is a breaking change)

conradludgate avatar Jan 12 '24 16:01 conradludgate

I think maybe it's better to have a dedicated endpoint/proxy instance for replication so we can ignore startup latency for those connections. This would require disallowing logrep on regular proxy (which is a breaking change)

Hmmm... I do not understand your proposal. The problem is that endpoint is not suspended. And you are suggesting to launch one more endpoint. Introducing yet another specialised proxy just for logical replication is possible... but IMHO it is awful.

I do not see much troubles in complicating proxy code. Actually IMHO it is not good that we have now two proxies: our proxy and pgbouncer. I think that finally we should implement connection pooling inside proxy.

Another "must have" features is permanent connections - suspending of compute should not break connection with client: it should be transparent to client. And it should be also done at proxy level.

Please notice that all this things should be handled only at connect time - there is no any overhead during normal work (forwarding messages from client to compute and backward).

knizhnik avatar Jan 12 '24 16:01 knizhnik

I do not see much troubles in complicating proxy code. Actually IMHO it is not good that we have now two proxies: our proxy and pgbouncer. I think that finally we should implement connection pooling inside proxy.

I'm with you on that one. My suggestion for a possible second proxy would not be in front of the current proxy, but separate entirely. So no double proxying.

Please notice that all this things should be handled only at connect time

This is true. I guess connection latencies shouldn't matter much except for HTTP/WS workflows. Regular TCP flows I think we would be ok with adding a little overhead on startup.

conradludgate avatar Jan 12 '24 16:01 conradludgate

There is no need to look for some particular query i.e. START_REPLICATION SLOT _ LOGICAL There should be replication parameter in connection string.

knizhnik avatar Jan 12 '24 16:01 knizhnik

I did see "replication": "database" in the startup parameters, but I can't find it documented as being required. Do you have a doc that confirms it must be given by a replication subscriber?

conradludgate avatar Jan 12 '24 16:01 conradludgate

Ah, found it: https://www.postgresql.org/docs/16/protocol-replication.html#PROTOCOL-REPLICATION

Passing database as the value for the replication parameter instructs the backend to go into logical replication walsender mode, connecting to the database specified in the dbname parameter. In logical replication walsender mode, the replication commands shown below as well as normal SQL commands can be issued.

conradludgate avatar Jan 12 '24 16:01 conradludgate

One idea on how to implement this:

When the compute shuts down, we keep the subcriber client-> proxy connection open. When the compute wakes up again (because of some other connection), we kill the client -> proxy connection. The subscriber should then reconnect.

hlinnaka avatar Feb 27 '24 09:02 hlinnaka

One idea on how to implement this:

When the compute shuts down, we keep the subcriber client-> proxy connection open. When the compute wakes up again (because of some other connection), we kill the client -> proxy connection. The subscriber should then reconnect.

it can really be done in this way. And it seems to be the simplest way.

One of the alternative is just to prohibit "replication" connections to wakeup compute. Replication connection can be easily recognised by database URL. The potential problem with this approach is that subscriber can immediately try to reconnect. To prevent overloading of network and server with such unsuccessful connection attempts we can send error with some delay. But if subscriber has some limit for maximal number of reconnects after which it give up, then user's will not like such behaviour.

The idea solution IMHO is to provide transparent connection: if compute reconnects, then proxy should just re-establish connection with compute and continue forwarding traffic between client and compute as if nothing happen.

knizhnik avatar Feb 27 '24 15:02 knizhnik

Discussion result from compute meeting: not being able to scale down to zero should not prevent us to release logical replication.

andreasscherbaum avatar Feb 27 '24 17:02 andreasscherbaum