node-pg-native
node-pg-native copied to clipboard
Support and bug fix for streaming replication protocol
Found a bug while trying to use copy-stream.js
to do logical decoding based on PostgreSQL streaming replication protocol:
Bug:
- While using
node-pg-native
and itscopy-stream.js
lib to interact with PostgreSQL's streaming replication protocol, when executing aSTART_REPLICATION (...)
async query to start streaming replication mode, an infinite loop happens since the read cycle doesn't stop upon receiving thePGRES_COPY_BOTH
message used in replication mode, so:- The condition to break the cycle must be changed to add a test for this message type (line 114)
- Before:
-
if(pq.resultStatus() == 'PGRES_COPY_OUT') break;
-
- After:
-
if(pq.resultStatus() == 'PGRES_COPY_OUT' || pq.resultStatus() == 'PGRES_COPY_BOTH') break;
-
- Before:
- The following
switch
block also needs to deal with this same message type, so its result can also be emitted (after line 124. The code that needs to be added after thePGRES_COPY_OUT
case clause is this:-
case 'PGRES_COPY_BOTH':
-
- WORKAROUND: We can use sync mode to get the START_REPLICATION to successfully start.
- The condition to break the cycle must be changed to add a test for this message type (line 114)
@brianc I'm not sure if something else needs to be changed or added to ensure proper streaming replication support. For what I've seen and tested, most probably isn't but I wanted to confirm with you that nothing else is necessary and if you agree with this.
Improvement suggestion
Concerns flushing the CopyStream in streaming replication mode:
- Libpq's
flush()
function could be exposed in thecopy-stream.js
lib or doing this implicitly in the_write
function to be applied after a successful write (this considering we can identify that a CopyStream's instance is being used in streaming replication mode, maybe adding a flag variable to the constructor can help for that purpose?):- When in replication mode, the stream should always be flushed after a successful write from the client side or else the server won't acknowledge the responses and will disconnect the logical decoding client within 60 seconds (default), causing an abort on the client's process without any stack trace of the error being output, just a short
abort (core dumped)
message. - Since the
flush()
method is inside thelibpq
, consideringcopyStream
is my CopyStream instance that we get from thegetCopyStream()
function, we then need to call it explicitly through itspq
value usingcopyStream.pq.flush()
. - PostgreSQL's
pg_recvlogical
tool source code is useful to understand this behaviour, checking thesendfeedback()
function we can see how the data is written and then flushed.
- When in replication mode, the stream should always be flushed after a successful write from the client side or else the server won't acknowledge the responses and will disconnect the logical decoding client within 60 seconds (default), causing an abort on the client's process without any stack trace of the error being output, just a short
- Add a section in
node-pg-native
's documentation on how to use the CopyStream
Useful references in PostgreSQL doc regarding this:
I've created a PR on #45 to fix this, I hope it helps.