armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Add a graceful shutdown timeout before closing the default `ClientFactory`

Open ikhoon opened this issue 1 year ago • 1 comments

Motivation:

The default ClientFactory closes immediately when a JVM shuts down. The closed ClientFactory prevents all clients using the default factory from processing requests.

This is not a problem when the client is used alone. However, many clients are used in a server to fetch or deliver data when the server receives a request. If the ClientFactory is closed while the server is still in graceful shutdown mode, all requests will fail. This behavior cannot be called a graceful shutdown.

In this case, an EndpointSelectionTimeoutException is raised by HealthCheckedEndointGroup due to the termination of the default ClientFactory used by HttpHealthChecker.

com.linecorp.armeria.client.UnprocessedRequestException: com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException:
  Failed to select within 6400 ms an endpoint from: HealthCheckedEndpointGroup{endpoints=[], numEndpoints=0, candidates=[Endpoint{example.com, ipAddr=x.x.x.x, weight=1000}, ...], numCandidates=8, ...,
  initialized=true, initialSelectionTimeoutMillis=10000, selectionTimeoutMillis=6400, contextGroupChain=[]}
    at com.linecorp.armeria.client.UnprocessedRequestException.of(UnprocessedRequestException.java:45)
    at com.linecorp.armeria.client.HttpClientDelegate.earlyFailedResponse(HttpClientDelegate.java:228)
...
Caused by: com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException:
  Failed to select within 6400 ms an endpoint from: HealthCheckedEndpointGroup{endpoints=[], numEndpoints=0, candidates=[Endpoint{example.com, ipAddr=x.x.x.x, weight=1000}, ...], numCandidates=8, ...,
  initialized=true, initialSelectionTimeoutMillis=10000, selectionTimeoutMillis=6400, contextGroupChain=[]}
    at com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException.get(EndpointSelectionTimeoutException.java:48)
    at com.linecorp.armeria.client.endpoint.AbstractEndpointSelector.lambda$select$0(AbstractEndpointSelector.java:117)
    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
    ...
    8 common frames omitted

I propose to add a delay before closing the default ClientFactory so that the server handles the request during graceful shutdown.

Modifications:

  • Add Flags.defaultClientFactoryGracefulShutdownTimeoutMillis() that indicates the default time to wait before closing the default ClientFactory.
  • Add TestFlagsProvider to override defaultClientFactoryGracefulShutdownTimeoutMillis for rapid iterative testing.

Result:

  • HealthCheckedEndpointGroup no longer raises EndpointSelectionTimeoutException when a server is stopped by the JVM shutdown hook.
  • You can now set a delay before the default ClientFactory shuts down via Flags.defaultClientFactoryGracefulShutdownTimeoutMillis(). If not set, 10 seconds is used by default.

ikhoon avatar Jun 03 '24 10:06 ikhoon

🔍 Build Scan® (commit: eb426fd2fa8bba0c481450bdd18826d60c7d30c2)

Job name Status Build Scan®
build-windows-latest-jdk-21 https://ge.armeria.dev/s/6hoa36mzfgt3o
build-self-hosted-unsafe-jdk-8 https://ge.armeria.dev/s/66jp6e4anxpui
build-self-hosted-unsafe-jdk-21-snapshot-blockhound https://ge.armeria.dev/s/2a7sam2jezsva
build-self-hosted-unsafe-jdk-17-min-java-17-coverage https://ge.armeria.dev/s/rpmavoixu3rou
build-self-hosted-unsafe-jdk-17-min-java-11 https://ge.armeria.dev/s/3kmsa7z5omkly
build-self-hosted-unsafe-jdk-17-leak https://ge.armeria.dev/s/ttlfy4igega2w
build-self-hosted-unsafe-jdk-11 https://ge.armeria.dev/s/xfleb32trwndw
build-macos-12-jdk-21 https://ge.armeria.dev/s/tbhra3avxdnxs

github-actions[bot] avatar Jun 03 '24 11:06 github-actions[bot]