pgx Detect stuck TCP connections

Is your feature request related to a problem? Please describe.

When we failover PostgreSQL server to another primary, old connections can remain stuck: for whatever reason they are not closed and attempt to use them has end up with timeout:

write failed: write tcp 10.32.14.13:36102->10.36.54.61:5432: i/o timeout

we see these entries minutes after failover with all connections in connection pool

Describe the solution you'd like

It seems that detecting hung TCP connection is not trivial. TCP keepalive won't help, because with outstanding writes connection is marked as active and no keepalives are set.

One way I can think of is if TCP IO timeout was somehow decoupled from Query context timeout. net.Conn has SetWriteDeadline(t time.Time) error method which on the surface should do the trick: configurable short timeout of 1-3 seconds will be enough to detect IO error and take connection out of the pool still allowing Query context have arbitrary timeout.

Describe alternatives you've considered

Have custom Dialer which sets TCP_USER_TIMEOUT (not unlike tcp_user_timeout connection string param in libpq) with raw sys calls, but it is not portable.

Another options is pgxpool connection health check. It seems to check connection health on Acquire, which underneath executes -- ping statement, but context passed for that check is from Acquire call which in turn itself is from pgxpool.Query so it suffers from the same problem of stuck connection: context timeout is tuned for query execution time overall. If health check was executed with a smaller context timeout derived from parent context then it would detect problematic connections more reliably. This is not a preferred option because it doesn't handle stuck connection detection for already Acquired connections.

Nov 12 '24 16:11 redbaron

As you say, detecting broken TCP connections is non-trivial.

I wonder if a write timeout would even work reliably? As I understand it, from Go's point of view, the write is successful when sent (maybe even only in the OS write buffer), not when acked in some manner. I think a write can succeed even when the underlying connection is broken. That's why the pool health check uses a -- ping statement. AFAIK, nothing lower level is sufficient to guarantee the connection is live.

Nov 16 '24 13:11 jackc

I wonder if a write timeout would even work reliably?

I believe it would cover at least our case. Error line seems to failing on write and error itself seems to be from net stack, lowering write timeout should do the trick. Detecting stuck connection in some of the cases is an improvement still.

if going with this approach, we need to be careful with large batch writes, which I suspect do expect to be blocked on write for long.

That's why the pool health check

I think pool healtcheck should use derived context with much shorter timeout regardless of decision made how to proceed with stuck connection detection. Probably that timeout should be configurable, because it differs whether connection is to PG or pgxpool . Our systems use direct connection to PG, so I'd configure it to 1s.

As for the purpose of this issue, pool health check won't help us much: all connections are in constant reuse and pool health check triggers on 1s of inactivity. If TCP connection becomes "stuck" only small subset of pool connections will be lucky enough to hit it at right moment and right state for pool check to trigger AND to catch issue with connection.

Nov 18 '24 09:11 redbaron

I'm open to some new setting for a write timeout. Not sure exactly what the interface would be though. Not sure how directly setting SetWriteDeadline would interact with the existing context system.

Nov 29 '24 00:11 jackc

See #2185. I proposed something there that might have some overlap with this.

Dec 06 '24 13:12 jackc

Context is useful when app should be able to dynamically pass piece of configuration or state down the call stack. Either health check ping length or write deadline look fairly static params to me, so just a config option will suffice.

Dec 06 '24 18:12 redbaron

You can set (on some operating systems) a TCP send timeout, as described in RFC-5482. For Linux, that would be a setsockopt syscall for IPPROTO_TCP, TCP_USER_TIMEOUT. It triggers a socket error (ETIMEDOUT) if outstanding data is not ACK'ed by the other side of the TCP connection within the specified timeout.

Note that if you want to timeout the TCP connect phase, you need something else, but golang already handles that with deadlines, AFAIK.

Mar 17 '25 19:03 hmh