Connection timeout wrongly applied to non-preferred hosts
Describe the bug (Related to the testing I mentioned #2171 but this appears to be a straightforward bug.)
When connecting with a target_session_attrs preference that cannot be satisfied and the final host times out, the connection to the non-preferred host is not attempted and falsely times out.
To Reproduce
- Create a connection string with two hosts and a preference for the second; e.g. a primary and standby with
target_session_attrs=prefer-standby - Block network traffic from the client to the standby.
- Configure a connection timeout for e.g. 1 second.
- Execute a query.
Expected behavior After 1 second the connection to the standby times out, the primary is chosen instead, and the query executes.
Actual behavior After 1 second the query returns a timeout.
Version
- go version go1.23.2 linux/amd64
- PostgreSQL: PostgreSQL 14.13 (Ubuntu 14.13-1.pgdg22.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
- pgx: v5.7.1
Additional context
From my understanding of the code, connectPreferred resets the timeout for each new host as it iterates the possible targets. However, once it's decided to fall back to fallbackConnectOneConfig, it reuses whatever the last ctx it had from the initial pass was. If this had timed out, the new connection will start in a timed-out state. It should instead probably iterate the entire list again with fresh contexts based on octx. Or, as an optimization, the first non-preferred connection could be returned rather than the config and returned directly if no other preferred one is found / closed if a preferred one is found.
We had the same bug now, we use prefer-standby with a list of hosts too. @jackc would you be open for MR fixing this?
@LKaemmerling Yes, if it can be cleanly resolved I'm interested. However, there might be a bit of a mismatch between how pgx uses fallback configs and ideal connect_timeout and target_session_attrs behavior. So it might not be very easy.