[ISSUE-5470] Add feature to gracefully shutdown connections in HttpClientFactory
Motivation:
Add graceful shutdown feature in HttpClientFactory. The implementation has been referenced from the server side graceful shutdown.
Modifications:
- Add
clientConnectionDrainDurationMilliswhich is used to set client-side drain settings - When
drainDurationMillis > 0inHttpClientFactory, graceful shutdown is applied.
Result:
- Closes #<5470>. (If this resolves the issue.)
- By allowing a grace period for ongoing requests to finish before closing connections, users can ensure that their requests complete
I wonder how to close the doors for new request(that client should send), when graceful shutdown is in progress.
๐ Build Scanยฎ (commit: 6bd9f78e8eca370a7a8d04850b5f17d168e8c4f9)
| Job name | Status | Build Scanยฎ |
|---|---|---|
| build-windows-latest-jdk-19 | โ | https://ge.armeria.dev/s/3ajyfh2oetchm |
| build-self-hosted-unsafe-jdk-8 | โ | https://ge.armeria.dev/s/5vsozbnl7ffey |
| build-self-hosted-unsafe-jdk-19-snapshot-blockhound | โ | https://ge.armeria.dev/s/mvwdi4prsrhsk |
| build-self-hosted-unsafe-jdk-17-min-java-17-coverage | โ | https://ge.armeria.dev/s/eqqxnlrxxwsnq |
| build-self-hosted-unsafe-jdk-17-min-java-11 | โ | https://ge.armeria.dev/s/lpztp6j6tqp6q |
| build-self-hosted-unsafe-jdk-17-leak | โ | https://ge.armeria.dev/s/ii5qoqcoeegsy |
| build-self-hosted-unsafe-jdk-11 | โ | https://ge.armeria.dev/s/pdlnnkgtiahp4 |
| build-macos-12-jdk-19 | โ | https://ge.armeria.dev/s/n2aeobtodnylu |
Codecov Report
Attention: Patch coverage is 96.22642% with 2 lines in your changes are missing coverage. Please review.
Project coverage is 74.10%. Comparing base (
b8eb810) to head (f7b91d7). Report is 208 commits behind head on main.
:exclamation: Current head f7b91d7 differs from pull request most recent head 6bd9f78
Please upload reports for the commit 6bd9f78 to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## main #5489 +/- ##
============================================
+ Coverage 73.95% 74.10% +0.15%
- Complexity 20115 21023 +908
============================================
Files 1730 1819 +89
Lines 74161 77442 +3281
Branches 9465 9889 +424
============================================
+ Hits 54847 57390 +2543
- Misses 14837 15392 +555
- Partials 4477 4660 +183
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I wonder how to close the doors for new request(that client should send), when graceful shutdown is in progress.
I believe we can add a boolean flag closing to the factory and raise an exception if the factory is used.
Hi @minwoox, please check if I've understood correctly.
I believe we can add a boolean flag closing to the factory and raise an exception if the factory is used.
Does it mean that we should add a closing flag to the ClientFactory and set it to true when draining starts? ๐ค
raise an exception if the factory is used.
I'm not sure what method is being called when the ClientFactory starts to be used(Is it WebClient's execute(...) method?). Need some help ๐
Does it mean that we should add a closing flag to the ClientFactory and set it to true when draining starts? ๐ค
Oops, I realized that we don't have to add a flag. We can just use isClosing() method.
https://github.com/line/armeria/blob/c2519e47fd691be2f8f2cbf38f31c1ce49363eb6/core/src/main/java/com/linecorp/armeria/client/DefaultClientFactory.java#L223
I'm not sure what method is being called when the ClientFactory starts to be used
I believe that we can add the logic to DefaultClientFactory.acquireEventLoop,DefaultClientFactory.newClient and HttpClientFactory.pool() methods.
HttpChannelPool pool(EventLoop eventLoop) {
if (isClosing()) {
throw new IllegalStateException("...");
}
...
Please ensure that the exception is propagated to the callers when thread rescheduling is involved.
@minwoox , thank you for the detailed explanation. ๐
Oops, I realized that we don't have to add a flag. We can just use isClosing() method.
I've added cheking logic which uses isClosing() on above mentioned methods.
Please ensure that the exception is propagated to the callers when thread rescheduling is involved.
I think I've not understood above statement well. Please correct me if I'm wrong ๐
- Thread rescheduling can occur when
DefaultClientFactory.acquireEventLoop,DefaultClientFactory.newClientandHttpClientFactory.pool()are called. - Exceptions are not propagated when rescheduling occurs(because thread changes)
So, should we wrap every method that calls above method(with CompletableFutures or whatever that can propagate exceptions even though thread changes) like below and handle exceptions?
CompletableFuture.supplyAsync(() -> methodsToConsiderThreadRescheduling(...))
.completeExceptionally(ex- > handleException(ex));
So, should we wrap every method that calls above method
Apologize for the unclear comment. ๐ I've rather imagined something like this:
private void acquireConnectionAndExecute0(ClientRequestContext ctx, Endpoint endpoint,
HttpRequest req, DecodedHttpResponse res,
ClientConnectionTimingsBuilder timingsBuilder,
ProxyConfig proxyConfig) {
final PoolKey key = new PoolKey(endpoint, proxyConfig);
final HttpChannelPool pool;
try {
pool = factory.pool(ctx.eventLoop().withoutContext());
} catch (Throwable t) {
final UnprocessedRequestException wrapped = UnprocessedRequestException.of(t);
handleEarlyRequestException(ctx, req, wrapped);
res.close(wrapped);
return;
}
...
@minwoox, I always appreciate the clear and concise explanation from you ๐. I've applied the exception cheking on above mentioned methods.
@minwoox I've added tests to check whether connection draining is working as expected.
I'm curious, what kind of test failures did you get?
Below three tests fails when DEFAULT_CLIENT_CONNECTION_DRAIN_DURATION_MICROS is set other than 0. ๐
As users can enable the feature anytime, tests should be performed when the value is enabled and disabled. Let me check why it fails. ๐ค
@ikhoon It seems like client side Http2GracefulConnectionShutdownHandler is sending GOAWAY frames twice
- first GO_AWAY frame to signal clients that shutdown is imminent
- second GO_AWAY frame to close the channel
The tests in Http2GoAwayTest seems to pass when I add logic to read the GO_AWAY frame twice like below
// Read a GOAWAY frame.
assertThat(readFrame(in).getByte(3)).isEqualTo(Http2FrameTypes.GO_AWAY);
assertThat(readFrame(in).getByte(3)).isEqualTo(Http2FrameTypes.GO_AWAY);
I think we should consider the client is sending GO_AWAY frame twice for graceful shutdown. Maybe we can check the values associated with the GO_AWAY frame. If the value of GO_AWAY frame is equal to Integer.MAX_VALUE, then it means indicates imminent shutdown and there will be a following GO_AWAY frame. And when the value is set tot the last stream ID, we can refer that as it's the last GO_AWAY frame.
first GO_AWAY frame to signal clients that shutdown is imminent
I don't see we need to send GO_AWAY with Integer.MAX_VALUE for clients until we support server push. Integer.MAX_VALUE indicates that the remote peer should not open new streams. If server push is not supported, it would be a waste of traffic.
Let me discuss double GO_AWAY frames on the client side with other maintainers.
@seonwoo960000 Had a chat with the team and it was not a good idea for sending double GOAWAY. ๐ Instead, we can just use Netty's graceful shutdown timeout. I pushed a commit that:
- Rename
connectionDrainDurationMircorstohttp2GracefulShutdownTimeoutMillis - Set the value to
clientConnectionHandler.gracefulShutdownTimeoutMillis()PTAL. ๐
@seonwoo960000 Could you also review the changes as well? ๐
@seonWKim ๐ ๐ ๐