node-postgres icon indicating copy to clipboard operation
node-postgres copied to clipboard

Race Condition Issue in PostgreSQL SSL Connection Process

Open edwardyoon2 opened this issue 9 months ago • 4 comments

I have been encountering "Error: Client network socket disconnected before secure TLS connection was established" errors recently without any specific cause when connecting to PostgreSQL through RDS Proxy. These errors occur intermittently, making them difficult to reproduce consistently. After examining the code in the pg module's lib/connection.js file, I suspect there is a race condition issue between the TCP connection and TLS connection process. The architecture of the connection flow appears problematic:

  • Client initiates TCP connection (stream.connect(port, host))
  • When TCP connection succeeds, a 'connect' event is emitted
  • However, the code for SSL/TLS connection is not directly linked to the 'connect' event
  • Instead, logic to wait for the server's SSL support response is set up separately (stream.once('data', function (buffer) {...})

This design creates potential timing issues because:

  • There's no explicit synchronization between the TCP connection establishment and when requestSsl() is called
  • The SSL handshake process depends on event listeners that operate asynchronously
  • When working with RDS Proxy as an intermediary layer, these timing sensitivities may become more problematic

I've verified this by testing direct connections to PostgreSQL without RDS Proxy, which work consistently. The issue only manifests when connecting through the proxy layer. Please check if my understanding of this potential race condition is correct, and if there are any recommended approaches to mitigate this issue.

edwardyoon2 avatar Mar 13 '25 06:03 edwardyoon2

Note to self: https://github.com/brianc/node-postgres/issues/3346 might require the change of the logic anyway

saper avatar Mar 13 '25 10:03 saper

I am not sure what you mean exactly by a race condition here. All network messaging happens asynchronously for hand-shake style things. The same socket is re-used when transitioning to SSL, it's just upgraded so you getting an issue about being disconnected before the TLS connection was established probably isn't related to a race condition...since there is no explicit disconnection being done by the library between establishing the first, non-TLS connection to the backend and then upgrading the same connection once the SSL support response is received.

brianc avatar Mar 15 '25 15:03 brianc

I suspect there might be an issue with RDS Proxy when using multiple database connections simultaneously through TypeOrmModule.forRootAsync() with the @InjectRepository decorator.

The error logs simply show intermittent occurrences of:

  • 'Client network socket disconnected before secure TLS connection was established'
  • 'Connection terminated unexpectedly'

edwardyoon2 avatar Mar 17 '25 00:03 edwardyoon2

We are intermittently seeing the same error, a few thousand times a day (which is a very, very tiny fraction of total DB connections), and have speculated (perhaps chiefly by failing to think of a better explanation…) that it might be related to delays caused by the Node event loop. We first noticed this with AWS RDS proxy, but on further inspection it turns out to happen with the main RDS instance as well (at either a similar frequency or too close to tell the difference).

haggholm avatar Jun 17 '25 21:06 haggholm