armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Close an HTTP/1.1 connection after delay

Open dachshu opened this issue 1 year ago โ€ข 6 comments

Motivation:

  • Closes https://github.com/line/armeria/issues/4849
  • To suppress a server socket from remaining in the TIME_WAIT state instead of CLOSED when a connection is closed.

Modifications:

  • Remove the force shutdown mode of AbstractHttpResponseHandler that is activated when a user sets the Connection: close HTTP header
  • Create a schedule that waits and closes the connection to wait for the client-side connection to close, rather than closing the HTTP/1.1 connection immediately.

Result:

  • Closes https://github.com/line/armeria/issues/4849
  • When a user wants to close an HTTP/1.1 server connection, the armeria server first waits for the client to close the connection for a while to make the connection state to CLOSED instead of TIME_WAIT.

dachshu avatar Apr 17 '24 02:04 dachshu

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 17 '24 02:04 CLAassistant

๐Ÿ” Build Scanยฎ (commit: 596b223ae457395390a356bc20b684de119e53df)

Job name Status Build Scanยฎ
build-windows-latest-jdk-21 โœ… https://ge.armeria.dev/s/5waqxdyew7bpw
build-self-hosted-unsafe-jdk-8 โœ… https://ge.armeria.dev/s/invs7rd7eogyo
build-self-hosted-unsafe-jdk-21-snapshot-blockhound โœ… https://ge.armeria.dev/s/nza6his2f7dhy
build-self-hosted-unsafe-jdk-17-min-java-17-coverage โœ… https://ge.armeria.dev/s/ncas2zwrbv67i
build-self-hosted-unsafe-jdk-17-min-java-11 โœ… https://ge.armeria.dev/s/qov5igwrcmcey
build-self-hosted-unsafe-jdk-17-leak โœ… https://ge.armeria.dev/s/nwk6m7sjehp62
build-self-hosted-unsafe-jdk-11 โœ… https://ge.armeria.dev/s/ne7f4532ze7gm
build-macos-12-jdk-21 โœ… https://ge.armeria.dev/s/2spxi2yyxc2uk

github-actions[bot] avatar Apr 17 '24 03:04 github-actions[bot]

https://github.com/line/armeria/pull/5616#issuecomment-2060205298 Please also sign the CLA ๐Ÿ™‡

jrhee17 avatar Apr 17 '24 05:04 jrhee17

Looking into the failing test, I guess there is a case where Connection: close header isn't sent but we set needsDisconnection() somewhere.

jrhee17 avatar Apr 30 '24 08:04 jrhee17

Went through the code, and I think the issue is the following:

  • The test failure occurs when testTooLargeContentToNonExistentService is run, and then testStreamingResponseWithSlowClient is run with HTTP1.1. The two use the same ClientFactory, and hence reuse connections.
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/test/java/com/linecorp/armeria/server/HttpServerStreamingTest.java#L161
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/test/java/com/linecorp/armeria/server/HttpServerStreamingTest.java#L220
  • When testTooLargeContentToNonExistentService is run, a 404 is returned from the FallbackService. This response is subsequently written to the wire without rescheduling.
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/main/java/com/linecorp/armeria/server/FallbackService.java#L46
  • However, afterwards the request is continuously consumed and reaches maxRequestLength. At this point, we set disconnectWhenFinished although the response has already been written. Hence, we don't have a chance to send a Connection: close header to the client.
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/main/java/com/linecorp/armeria/server/Http1RequestDecoder.java#L324-L324
  • Since we don't close the connection immediately, the client believes the connection is healthy and tries to reuse this connection. However, the discarding flag is already set and ensuing requests are silently ignored
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/main/java/com/linecorp/armeria/server/Http1RequestDecoder.java#L320
  • Lastly, the idle timeout kicks in which fails the test

Overall, I think it's probably fine to just close the connection immediately if we can't write the 413 response headers. This isn't a perfect solution, but at least preserves backwards compatibility.

  • We can check encoder.isResponseHeadersSent(id, 1) at the following location, and close the channel immediately if the response headers have already been written.
    • https://github.com/line/armeria/blob/0a5a3d7a2f74f51a4893270a6f4aeac5f7e3c575/core/src/main/java/com/linecorp/armeria/server/Http1RequestDecoder.java#L324-L325
  • Otherwise the 413 with Connection: close will be written, so we can preserve the previous behavior.

jrhee17 avatar Apr 30 '24 10:04 jrhee17

https://github.com/line/armeria/pull/5616#issuecomment-2084886389

When also I analysed the reason for the failure of this test, I found that the response is sent immediately at 404. And when the content size exceeds maxRequestLength while reading the rest of the request content, the Connetion: close header is not sent and the server closes the connection. The problem was that the connection discarding flag was set and the will be closed soon was reused in the next test. Because this PR does not check if a response has already been sent, but just waits for the set time before closing the connection, the client didn't know that the connection was about to be closed and reused it causing the connection to be closed during the test. To solve this, I was trying to figure out how to make it fail fast, as I think the correct behaviour is to ignore the content if it's a 404. If I knew the length of the content, it shouldn't be a problem. But I thought there might be an issue with using chunked transfer encoding. I also wondered if it would be better to just set discarding to true and read the request to the end and send the response, since even in the 404, the server reads all the content sent by the client anyway. When I thought about the method you advised, it might be a problem when it takes a long time to reach the content max length so the same phenomenon can occur like in the test. It would look like the connection is suddenly closed from the client's point of view. However, I will modify it to ensure backwards compatibility as you advised and consider a better way as another issue. Thanks for your advice ๐Ÿ™‡

dachshu avatar May 06 '24 13:05 dachshu

However, I will modify it to ensure backwards compatibility as you advised and consider a better way as another issue.

Thanks! Let us know if you would like to try handling this as a follow-up issue.

jrhee17 avatar May 08 '24 01:05 jrhee17

However, I will modify it to ensure backwards compatibility as you advised and consider a better way as another issue.

Thanks! Let us know if you would like to try handling this as a follow-up issue.

Yes, I would like to!

dachshu avatar May 09 '24 12:05 dachshu