blaze icon indicating copy to clipboard operation
blaze copied to clipboard

BlazeClient's idleTimeout works differently than BlazeServer's idleTimeout

Open RafalSumislawski opened this issue 3 years ago • 10 comments

Both BlazeClient and BlazeServer offer an idleTimeout configuration option. The option has the same name in both, and shares implementation (IdleTimeoutStage) but from user's perspective it behaves differently.

In BlazeClient the idleTimeout is active only when a request is in-flight. It may kick in if the server takes too long to respond (unless other timeout does it first) or when the server starts sending a response and then pauses for too long. An idle (not borrowed from the pool) connection will remain open indefinitly as the idleTimeout is deactivated when a connection is returned to the pool. This may contradict users assumptions about the meaning of the word "idle" in the context of an HTTP client.

In BlazeServer the idleTimeout is always active, and it may terminate connections that have an in-flight request but nothing happened for too long, as well as connections which are kept open but unused.

I see this inconsistency as a bad developer experience. In addition to that I see the lack of mechanism for evicting unused connections from the client's connection pool, as a missing feature and potential resource leak.

RafalSumislawski avatar Apr 30 '21 15:04 RafalSumislawski

By way of comparison, there is a readTimeout and a pooledConnectionIdleTimeout in AHC. Jetty client has just an idleTimeout, and it's not clear which definition it means.

rossabaker avatar May 02 '21 04:05 rossabaker

http4s/http4s#3700 seems to be related. Maybe the can be solved together.

RafalSumislawski avatar May 04 '21 20:05 RafalSumislawski

http4s/blaze#675 also seems related

RafalSumislawski avatar May 09 '21 18:05 RafalSumislawski

For reference akka-http has a similar problem, and there's a PR adding a second kind of timeout to http client: https://github.com/akka/akka-http/pull/3816

RafalSumislawski avatar Jun 05 '21 18:06 RafalSumislawski

So if we added a keepAliveTimeout and made idleTimeout behave like blaze-client, it seems we'd be in a good place?

rossabaker avatar Jul 09 '21 03:07 rossabaker

I'm not convinced what we should do with the blaze-server idleTimeout (or if we should do anything with it).

What I am currently convinced to is that the changes in akka-http-client go in the right direction and we should do the same in blaze-client. It should have two timeouts:

  • an idleTimeout that is active while the connection is being used. It's main purpose being to prevent getting stuck in this state. The current implementation does exactly that.
  • and a new keepAliveTimeout which would be active while the connection sits in the pool.

While this doesn't solve the user experience problem I stated in the title of this issue "BlazeClient's idleTimeout works differently than BlazeServer's idleTimeout". I think it solves the the more serious technical problems of:

  • being unable to evict connections from the pool before server closes them, and therefore risking a race between client using the connection and server closing it due to timeout.
  • being able to evict unused connection from the pool in order to free up resources

P.S. I feel a need to on more client-side timeout, sort of a maximumLifespan, that would be useful in forcing a rebalancing of connections though a load-balancer. But that's a story for another ticket.

RafalSumislawski avatar Jul 10 '21 16:07 RafalSumislawski

We had an issue at $WORK with a server that used round-robin DNS interacting badly with the JVM's DNS caching. We filled the pool against one cached instance. It was proposed a max lifespan would help that, but I think it would expire those connections at once and then overload another instance as it replenished the connections within a new cache window. A maximum lifespan with jitter could have helped us here after the first wave. But that's probably solving the wrong problem: we needed a different load balancing strategy.

I like your ideas here.

rossabaker avatar Jul 11 '21 15:07 rossabaker

is there a plan to make this timeout available and exposed? we were getting org.http4s.blaze.pipeline.Command$EOF$: EOF regularly on 0.21.28 and 0.22.4, we updated to 0.22.6 now, but I haven't seen any change, which would suggest that internal pool gets refreshed or cleaned up before server closes connections.

mr-git avatar Oct 08 '21 12:10 mr-git

No, there's not yet anything like the keepAliveTimeout implemented, but I'm sure that would be exposed and configurable once implemented.

rossabaker avatar Oct 26 '21 03:10 rossabaker

 /** Time a connection can be idle and still be borrowed.  Helps deal
    * with connections that are closed while idling in the pool for an
    * extended period.  `Duration.Inf` means no timeout.
    */
  def withMaxIdleDuration(maxIdleDuration: Duration = maxIdleDuration): BlazeClientBuilder[F] =
    copy(maxIdleDuration = maxIdleDuration)

Is this the new and correct timeout, which should remove connections from pool?

mr-git avatar Aug 09 '22 11:08 mr-git