okhttp
okhttp copied to clipboard
ConnectionPool sometimes immediately drops connection and establishes a new one
We are using okhttp (v4.12.0) in a latency sensitive scenario and try to keep idle connections alive for longer periods of time (e.g. 1 hour). We control the client and the server, so we don't have to worry too much about too many connections being opened, but the connections that are open should be able to handle requests without incurring a TLS handshake.
Sometimes we noticed that a connection seems to be taken from the ConnectionPool and immediately gets returned back to the pool and a new connection is established instead. We installed an EventListener and this is the output:
2025-04-23T19:06:02.445051193Z 0018 https://my-url.com/rpc
2025-04-23T19:06:02.445535237Z 0018 0.000 callStart
2025-04-23T19:06:02.445674751Z 0018 0.001 π Connection reused: /1.2.3.4:443
2025-04-23T19:06:02.446980894Z 0018 0.001 requestHeadersStart
2025-04-23T19:06:02.447442174Z 0018 0.003 requestHeadersEnd
2025-04-23T19:06:02.447661337Z 0018 0.003 requestBodyStart
2025-04-23T19:06:02.449092246Z 0018 0.004 requestBodyEnd
2025-04-23T19:06:02.481962548Z 0018 0.037 responseHeadersStart
2025-04-23T19:06:02.482055071Z 0018 0.037 responseHeadersEnd
2025-04-23T19:06:02.482092757Z 0018 0.037 responseBodyStart
2025-04-23T19:06:02.482123833Z 0018 0.037 responseBodyEnd
2025-04-23T19:06:02.482820818Z 0018 0.038 π€ Connection released: /1.2.3.4:443
2025-04-23T19:06:02.483200652Z 0018 0.038 callEnd
2025-04-23T19:06:02.484523673Z 0019 https://my-url.com/rpc
2025-04-23T19:06:02.484973743Z 0019 0.000 callStart
2025-04-23T19:06:02.485965898Z 0019 0.001 π Connection reused: /1.2.3.4:443
2025-04-23T19:06:02.486016868Z 0019 0.001 π€ Connection released: /1.2.3.4:443
2025-04-23T19:06:02.486051017Z 0019 0.001 proxySelectStart
2025-04-23T19:06:02.487130032Z 0019 0.002 proxySelectEnd
2025-04-23T19:06:02.487308996Z 0019 0.003 connectStart
2025-04-23T19:06:02.511241745Z 0019 0.026 secureConnectStart
2025-04-23T19:06:02.545458089Z 0019 0.060 secureConnectEnd
2025-04-23T19:06:02.545766112Z 0019 0.061 π New connection established: /1.2.3.4:443
2025-04-23T19:06:02.545857281Z 0019 0.061 π Connection reused: /1.2.3.4:443
2025-04-23T19:06:02.546091630Z 0019 0.061 requestHeadersStart
2025-04-23T19:06:02.546429571Z 0019 0.062 requestHeadersEnd
2025-04-23T19:06:02.546531342Z 0019 0.062 requestBodyStart
2025-04-23T19:06:02.547075670Z 0019 0.062 requestBodyEnd
2025-04-23T19:06:02.630127344Z 0019 0.145 responseHeadersStart
2025-04-23T19:06:02.630624969Z 0019 0.146 responseHeadersEnd
2025-04-23T19:06:02.631339107Z 0019 0.147 responseBodyStart
2025-04-23T19:06:02.631407407Z 0019 0.147 responseBodyEnd
2025-04-23T19:06:02.632111855Z 0019 0.147 π€ Connection released: /1.2.3.4:443
2025-04-23T19:06:02.632629538Z 0019 0.148 callEnd
2025-04-23T19:06:02.725177907Z 0020 https://my-url.com/rpc
2025-04-23T19:06:02.725500402Z 0020 0.000 callStart
2025-04-23T19:06:02.726004022Z 0020 0.001 π Connection reused: /1.2.3.4:443
2025-04-23T19:06:02.726494378Z 0020 0.001 requestHeadersStart
2025-04-23T19:06:02.727563300Z 0020 0.002 requestHeadersEnd
2025-04-23T19:06:02.727662019Z 0020 0.002 requestBodyStart
2025-04-23T19:06:02.727694406Z 0020 0.002 requestBodyEnd
2025-04-23T19:06:02.755305382Z 0020 0.030 responseHeadersStart
2025-04-23T19:06:02.755397157Z 0020 0.030 responseHeadersEnd
2025-04-23T19:06:02.755999533Z 0020 0.031 responseBodyStart
2025-04-23T19:06:02.756070025Z 0020 0.031 responseBodyEnd
2025-04-23T19:06:02.757690321Z 0020 0.031 π€ Connection released: /1.2.3.4:443
2025-04-23T19:06:02.757756861Z 0020 0.031 callEnd
- We do 3 network calls in quick succession here (but not interleaving, i.e. they are done one after another), and all go to the exact same url (which resolves to the same ip, i.e. is not behind a DNS load balancer or anything similar).
- These are http2 requests
- The first call (18) takes an existing connection from the pool, gets a valid response and releases the connection back to the pool, so everything works fine here.
- The next call (19) however, takes the connection from the pool, but immediately returns it back and then goes on to establish a new connection, which then takes ~60ms and interestingly the actual call afterwards also takes ~80ms instead of the usual ~30ms (is there a reason why the first call on a new connection might take longer?). Note that the call 19 takes the existing connection
- The final call (20) correctly takes the connection from the pool again and uses it.
How can we find out why the connection gets returned back to the pool immediately during call 19? Is there anything else that we should do to keep connections alive in addition to setting the keepAlive in the ConnectionPool to 60 minutes?
This is most likely caused by nginx's default of keepalive_time 1h, which closes down a connection after 1 hour during the next request processing. This is also why our first http call gets properly routed through the existing connection, but the very next call fails (because I guess the connection was then already closed in the underlying http stack). It would be great if there would be any indication for this in okhttp logs though (maybe there already is a way to get this info?), since without control of the server (and enabling verbose logging there), it is almost impossible to figure this out just from client-side.
I'll keep this open for maintainers to give a response whether this is something they are interested in adding or if this is already available, but the issue itself is solved and was not an issue of okhttp, sorry for the noise.
Yeah, we should include such logging in the 5.x version. Let me have a think how to achieve that.
There is precedent with logging info like retry decisions https://github.com/square/okhttp/commit/3319f24bf72ba2af05cad41a3ad8d89de513d328
The ConnectionListener in 5.x should be receiving an event, but we should consider what to add to make it useful.
/**
* Listener for connection events. Extend this class to monitor the new connections and closes.
*
* All event methods must execute fast, without external locking, cannot throw exceptions,
* attempt to mutate the event parameters, or be reentrant back into the client.
* Any IO - writing to files or network should be done asynchronously.
*/
@ExperimentalOkHttpApi
abstract class ConnectionListener {
...
/**
* Invoked when a call is assigned a particular connection.
*/
open fun connectionAcquired(
connection: Connection,
call: Call,
) {}
/**
* Invoked when a call no longer uses a connection.
*/
open fun connectionReleased(
connection: Connection,
call: Call,
) {}
/**
* Invoked when a connection is marked for no new exchanges.
*/
open fun noNewExchanges(connection: Connection) {}
Events would look something like
"ConnectionAcquired",
"NoNewExchanges", <-- failed health check
"ConnectionReleased",
"ConnectionClosed",
"ConnectStart",
Reasons for HTTP/2 for the connection to not be healthy
https://github.com/square/okhttp/blob/master/okhttp/src/commonJvmAndroid/kotlin/okhttp3/internal/connection/RealConnection.kt#L322 https://github.com/square/okhttp/blob/master/okhttp/src/commonJvmAndroid/kotlin/okhttp3/internal/-UtilJvm.kt#L207 https://github.com/square/okhttp/blob/master/okhttp/src/commonJvmAndroid/kotlin/okhttp3/internal/http2/Http2Connection.kt#L537
Socket is closed Socket read fails missed deadline for pong
Likely a real issue, but needs info to be investigated. Ideally a way to reproduce.