Add a graceful shutdown timeout before closing the default `ClientFactory`
Motivation:
The default ClientFactory closes immediately when a JVM shuts down. The closed ClientFactory prevents all clients using the default factory from processing requests.
This is not a problem when the client is used alone. However, many clients are used in a server to fetch or deliver data when the server receives a request. If the ClientFactory is closed while the server is still in graceful shutdown mode, all requests will fail. This behavior cannot be called a graceful shutdown.
In this case, an EndpointSelectionTimeoutException is raised by HealthCheckedEndointGroup due to the termination of the default ClientFactory used by HttpHealthChecker.
com.linecorp.armeria.client.UnprocessedRequestException: com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException:
Failed to select within 6400 ms an endpoint from: HealthCheckedEndpointGroup{endpoints=[], numEndpoints=0, candidates=[Endpoint{example.com, ipAddr=x.x.x.x, weight=1000}, ...], numCandidates=8, ...,
initialized=true, initialSelectionTimeoutMillis=10000, selectionTimeoutMillis=6400, contextGroupChain=[]}
at com.linecorp.armeria.client.UnprocessedRequestException.of(UnprocessedRequestException.java:45)
at com.linecorp.armeria.client.HttpClientDelegate.earlyFailedResponse(HttpClientDelegate.java:228)
...
Caused by: com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException:
Failed to select within 6400 ms an endpoint from: HealthCheckedEndpointGroup{endpoints=[], numEndpoints=0, candidates=[Endpoint{example.com, ipAddr=x.x.x.x, weight=1000}, ...], numCandidates=8, ...,
initialized=true, initialSelectionTimeoutMillis=10000, selectionTimeoutMillis=6400, contextGroupChain=[]}
at com.linecorp.armeria.client.endpoint.EndpointSelectionTimeoutException.get(EndpointSelectionTimeoutException.java:48)
at com.linecorp.armeria.client.endpoint.AbstractEndpointSelector.lambda$select$0(AbstractEndpointSelector.java:117)
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
...
8 common frames omitted
I propose to add a delay before closing the default ClientFactory so that the server handles the request during graceful shutdown.
Modifications:
- Add
Flags.defaultClientFactoryGracefulShutdownTimeoutMillis()that indicates the default time to wait before closing the defaultClientFactory. - Add
TestFlagsProviderto overridedefaultClientFactoryGracefulShutdownTimeoutMillisfor rapid iterative testing.
Result:
HealthCheckedEndpointGroupno longer raisesEndpointSelectionTimeoutExceptionwhen a server is stopped by the JVM shutdown hook.- You can now set a delay before the default
ClientFactoryshuts down viaFlags.defaultClientFactoryGracefulShutdownTimeoutMillis(). If not set, 10 seconds is used by default.
🔍 Build Scan® (commit: eb426fd2fa8bba0c481450bdd18826d60c7d30c2)
| Job name | Status | Build Scan® |
|---|---|---|
| build-windows-latest-jdk-21 | ✅ | https://ge.armeria.dev/s/6hoa36mzfgt3o |
| build-self-hosted-unsafe-jdk-8 | ✅ | https://ge.armeria.dev/s/66jp6e4anxpui |
| build-self-hosted-unsafe-jdk-21-snapshot-blockhound | ✅ | https://ge.armeria.dev/s/2a7sam2jezsva |
| build-self-hosted-unsafe-jdk-17-min-java-17-coverage | ✅ | https://ge.armeria.dev/s/rpmavoixu3rou |
| build-self-hosted-unsafe-jdk-17-min-java-11 | ✅ | https://ge.armeria.dev/s/3kmsa7z5omkly |
| build-self-hosted-unsafe-jdk-17-leak | ✅ | https://ge.armeria.dev/s/ttlfy4igega2w |
| build-self-hosted-unsafe-jdk-11 | ✅ | https://ge.armeria.dev/s/xfleb32trwndw |
| build-macos-12-jdk-21 | ✅ | https://ge.armeria.dev/s/tbhra3avxdnxs |