node-pg-native icon indicating copy to clipboard operation
node-pg-native copied to clipboard

Support and bug fix for streaming replication protocol

Open sigmascord opened this issue 9 years ago • 1 comments

Found a bug while trying to use copy-stream.js to do logical decoding based on PostgreSQL streaming replication protocol:

Bug:

  1. While using node-pg-native and its copy-stream.js lib to interact with PostgreSQL's streaming replication protocol, when executing a START_REPLICATION (...) async query to start streaming replication mode, an infinite loop happens since the read cycle doesn't stop upon receiving the PGRES_COPY_BOTH message used in replication mode, so:
    • The condition to break the cycle must be changed to add a test for this message type (line 114)
      • Before:
        • if(pq.resultStatus() == 'PGRES_COPY_OUT') break;
      • After:
        • if(pq.resultStatus() == 'PGRES_COPY_OUT' || pq.resultStatus() == 'PGRES_COPY_BOTH') break;
    • The following switch block also needs to deal with this same message type, so its result can also be emitted (after line 124. The code that needs to be added after the PGRES_COPY_OUT case clause is this:
      • case 'PGRES_COPY_BOTH':
    • WORKAROUND: We can use sync mode to get the START_REPLICATION to successfully start.

@brianc I'm not sure if something else needs to be changed or added to ensure proper streaming replication support. For what I've seen and tested, most probably isn't but I wanted to confirm with you that nothing else is necessary and if you agree with this.

Improvement suggestion

Concerns flushing the CopyStream in streaming replication mode:

  1. Libpq's flush() function could be exposed in the copy-stream.js lib or doing this implicitly in the _write function to be applied after a successful write (this considering we can identify that a CopyStream's instance is being used in streaming replication mode, maybe adding a flag variable to the constructor can help for that purpose?):
    • When in replication mode, the stream should always be flushed after a successful write from the client side or else the server won't acknowledge the responses and will disconnect the logical decoding client within 60 seconds (default), causing an abort on the client's process without any stack trace of the error being output, just a short abort (core dumped) message.
    • Since the flush() method is inside the libpq, considering copyStream is my CopyStream instance that we get from the getCopyStream() function, we then need to call it explicitly through its pq value using copyStream.pq.flush().
    • PostgreSQL's pg_recvlogical tool source code is useful to understand this behaviour, checking the sendfeedback() function we can see how the data is written and then flushed.
  2. Add a section in node-pg-native's documentation on how to use the CopyStream

Useful references in PostgreSQL doc regarding this:

sigmascord avatar Feb 04 '16 18:02 sigmascord

I've created a PR on #45 to fix this, I hope it helps.

sigmascord avatar Mar 22 '16 16:03 sigmascord