jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8371903: HttpClient: improve handling of HTTP/2 GOAWAY frames with error code

Open ehs208 opened this issue 1 month ago • 24 comments

Problem

When the HTTP/2 client receives a GOAWAY frame with a non-zero error code, the current implementation discards both the error code and debug data. Users only see generic "Connection closed by peer" errors without any information about why the server terminated the connection.

Solution

Per RFC 9113 §5.4.1, a GOAWAY frame with a non-zero error code indicates a connection error requiring immediate closure. This fix:

  1. Distinguishes graceful shutdown (NO_ERROR) from connection errors
  2. Preserves error code and debug data in exception messages
  3. Categorizes streams based on lastStreamId:
    • Streams with ID > lastStreamId: Marked as unprocessed for automatic retry
    • Streams with ID ≤ lastStreamId: Failed with detailed error information

Changes

Core Implementation (Http2Connection.java):

  • Modified handleGoAway() to check error code and route appropriately
  • Added handleGoAwayWithError() method that:
    • Extracts error code and debug data from GOAWAY frame
    • Creates meaningful error messages with error name, hex code, and debug data
    • Properly categorizes streams for retry or failure

Test Infrastructure:

  • Http2TestServerConnection.sendGoAway(int, int, byte[]): Supports custom error codes
  • Http2TestExchangeImpl.getServerConnection(): Accessor for test handlers
  • GoAwayWithErrorTest: Verifies proper error propagation

Example

Before: IOException: Connection closed by peer

After: IOException: Received GOAWAY with error code Protocol error (0x1): Invalid HEADERS frame

Testing

  • New GoAwayWithErrorTest passes
  • Existing HTTP/2 tests unaffected (NO_ERROR path unchanged)
  • Backward compatible (no public API changes)

Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8371903: HttpClient: improve handling of HTTP/2 GOAWAY frames with error code (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28632/head:pull/28632
$ git checkout pull/28632

Update a local copy of the PR:
$ git checkout pull/28632
$ git pull https://git.openjdk.org/jdk.git pull/28632/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28632

View PR using the GUI difftool:
$ git pr show -t 28632

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28632.diff

Using Webrev

Link to Webrev Comment

ehs208 avatar Dec 03 '25 12:12 ehs208

:wave: Welcome back ehs208! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Dec 03 '25 12:12 bridgekeeper[bot]

@ehs208 This change is no longer ready for integration - check the PR body for details.

openjdk[bot] avatar Dec 03 '25 12:12 openjdk[bot]

@ehs208 The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Dec 03 '25 12:12 openjdk[bot]

@djelinski
Thanks for the feedback! Made all the changes:

  • Now using cause.getCloseCause() for consistency
  • Merged sendGoAway methods as suggested
  • Test now verifies unprocessed stream retries with lastStreamId=1

Let me know if anything else needs adjustment!

ehs208 avatar Dec 04 '25 10:12 ehs208

I'd really like to see some retried requests in the test. Please:

  • change the requests to POST; idempotent requests like GET might be retried for other reasons, POST is only retried when unprocessed,
  • send all 3 requests at once, so that there's at least a chance that they will be in progress when GOAWAY is sent,
  • check that exactly one request fails. It doesn't have to be the first one, the client sometimes reorders the requests internally.

djelinski avatar Dec 04 '25 12:12 djelinski

  • Modified handleGoAway() to check error code and route appropriately

  • Added handleGoAwayWithError() method that:

    • Extracts error code and debug data from GOAWAY frame
    • Creates meaningful error messages with error name, hex code, and debug data
    • Properly categorizes streams for retry or failure

This looks quite similar to something I have proposed here:

  • https://github.com/laeubi/java-http-client/issues/8

As mentioned there, I think having a generic Exception with a STRING(!) message is not really something useful. I would suggest to add an own exception type so the client code is able to extract further details in a semantic way.

laeubi avatar Dec 05 '25 06:12 laeubi

Hi @laeubi, yes it's related to your report. The point of this PR is to bring HTTP2 on par with HTTP3, which throws an exception with a string message. Specialized exception can be handled as a future enhancement, but I'd like to see some evidence that it would be useful to users.

djelinski avatar Dec 05 '25 09:12 djelinski

@djelinski Updated per your suggestions:

  • Switched to POST requests (only retried when unprocessed)
  • Sending all 3 requests at once to test race conditions
  • Dynamic lastStreamId based on first arriving request
  • Validates exactly 1 failure and 2 successes regardless of order

Test now correctly verifies unprocessed stream retry behavior.

ehs208 avatar Dec 06 '25 11:12 ehs208

@djelinski Fixed the intermittent test failures:

  • Removed close() call after GOAWAY to avoid connection reset errors
  • Server now relies on Http2TestServer for connection cleanup
  • Tested 10 times locally, all passed

Should be stable now. Please review when you get a chance

ehs208 avatar Dec 10 '25 10:12 ehs208

@ehs208 thanks for that, the test looks stable now.

Looking at the test logs, I realized why the goAwaySentLatch.await was where you originally had it; now that it's after all 3 requests, the requests are sent on 3 distinct connections, which is not what we want. Please move it back to its original position (after sending first request). Sorry for my confusion.

djelinski avatar Dec 10 '25 13:12 djelinski

@djelinski I moved the goAwaySentLatch.await() back to its original position (after the first request, before second/third).

However, after testing locally, I'm seeing the same behavior - all 3 requests create separate connections. I believe this is expected because:

  1. First request uses connection 1
  2. Server sends GOAWAY, connection 1 receives it
  3. Second and third requests are retried on new connections (2 and 3)

Since requests 2 and 3 are retries after GOAWAY, they naturally go to new connections. The await position doesn't change this - it just controls when the retries are sent.

Could you clarify what connection behavior you're expecting? The test is passing and correctly verifies that:

  • Exactly 1 request fails with GOAWAY error details
  • Exactly 2 requests succeed after retry

Let me know if I'm missing something!

ehs208 avatar Dec 10 '25 14:12 ehs208

@djelinski Oops, you're right! I moved goAwaySentLatch.await() back to its original position and now it works as expected - only 1 connection is created initially, and the other 2 are retries. Tested locally and all passed.

ehs208 avatar Dec 10 '25 14:12 ehs208

Right, we only create one connection initially, and requests 2 and 3 are usually sent over the first connection before they are retried on new connections. This can be observed in the server logs as resetting stream 3 as REFUSED_STREAM. Ideally the server should not send anything after sending the GOAWAY frame with error, but I think what we have is good enough,

djelinski avatar Dec 10 '25 14:12 djelinski

@djelinski Thanks for the explanation. That makes sense. If there’s an opportunity in the future, I’d be happy to help clean that part up as well. Appreciate your guidance.

ehs208 avatar Dec 10 '25 14:12 ehs208

/integrate

ehs208 avatar Dec 10 '25 14:12 ehs208

@ehs208 Your change (at version 6dc77e1b266e30aa2e0b4c0bcf0067e06b65df0f) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Dec 10 '25 14:12 openjdk[bot]

@djelinski Oops, you're right! I moved goAwaySentLatch.await() back to its original position and now it works as expected - only 1 connection is created initially, and the other 2 are retries. Tested locally and all passed.

FWIW - if the request succeeds (no exception thrown) you can check/assert whether two requests were sent on the same or different connections by comparing the value returned by HttpResponse::connectionLabel(); Though the result is an Optional our implementation will never return an empty optional.

dfuch avatar Dec 10 '25 16:12 dfuch

@dfuch Thank you for the thorough review. I've applied Utils.adjustTimeout() to both timeout values as you suggested.

Also, thanks for the tip about HttpResponse::connectionLabel() - that's really helpful to know for verifying connection reuse in the future. For this test, I'll keep the current approach since it's already validating the retry behavior correctly, but I'll definitely keep that in mind for future tests.

ehs208 avatar Dec 10 '25 23:12 ehs208

/integrate

ehs208 avatar Dec 11 '25 11:12 ehs208

Thanks for the updates. LGTM.

Thank you!

ehs208 avatar Dec 11 '25 11:12 ehs208

@ehs208 Your change (at version 6068c1f9710bde3545c91faa5045b3a650bdba38) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Dec 11 '25 11:12 openjdk[bot]

@jaikiran Thank you for the detailed review! I've addressed all your feedback:

  1. Http2Connection.java: Removed the else block so close(cause)doTerminate() handles processed streams. This ensures connection closure cause and stream termination cause are consistent.

  2. GoAwayWithErrorTest.java:

  • Using URIBuilder with server's actual address instead of "localhost"
  • Removed timeout from goAwaySentLatch.await()
  • Removed entire allOf().orTimeout().join() block
  • Replaced success counting with direct assertEquals(200, response.statusCode())

ehs208 avatar Dec 13 '25 04:12 ehs208

Thank you @ehs208 for these updates. The changes look good to me. I'll run the latest changes in our CI to make sure this continues to pass.

On a related note, have you run tier2 locally with these changes? The GitHub actions job only run tier1 and networking tests aren't covered in that. It would be good to run tier2 on your local setup too - that way any future contributions in this area can be tested locally to verify that changes don't break unexpected existing tests.

jaikiran avatar Dec 15 '25 13:12 jaikiran

@jaikiran I ran jdk_net tests locally. Results: 1047 passed, 1 failed.

The failed test is HTTPSetAuthenticatorTest (HTTP/1.1 Authenticator test) with SocketException: Connection reset, which appears to be a flaky test unrelated to this PR's HTTP/2 GOAWAY changes. All HTTP/2 related tests passed successfully.

ehs208 avatar Dec 16 '25 10:12 ehs208

/integrate

ehs208 avatar Dec 16 '25 10:12 ehs208

@ehs208 Your change (at version 566dfe235ae29ae3028c639d55b6c90886c298c5) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Dec 16 '25 10:12 openjdk[bot]

Hello @ehs208, I'm investigating a intermittent test failure of the newly added test in our CI. I think the test will need a few adjustments. I haven't had a chance to go over the failure in its entirety. I will add some details here when I am done looking at that failure.

jaikiran avatar Dec 16 '25 10:12 jaikiran

@jaikiran Hello, Just checking in on the intermittent test failure investigation. No rush, just wanted to see if there's any update or if there's anything I can help with. Thanks

ehs208 avatar Dec 18 '25 15:12 ehs208

Hello @ehs208, I'm still looking into it (with help from Daniel too), but mostly it's blocked on me. I will need some more time to get back to this, please. It's still on my priority list and I should have something soon.

jaikiran avatar Dec 18 '25 15:12 jaikiran