jetty.project
jetty.project copied to clipboard
Jetty 12.0.8 seems to leak connection when it encounters earlyEOF
Jetty version(s) 12.0.8
Jetty Environment ee10
Java version/vendor (use: java -version)
java 21.0.2 2024-01-16 LTS
Java(TM) SE Runtime Environment Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.2+13.1 (build 21.0.2+13-LTS-jvmci-23.1-b30, mixed mode, sharing)
OS type/version Ubuntu 22.04
Description We have seen a clear increase of requests from mobile clients where jetty fails to handle/parse the request body and causes HttpConnection$RequestHandler.earlyEOF to trigger. That in itself is perhaps not alarming, but it seems that this causes Jetty to leak requests and connection, we can see a steady increase in our metrics of QoSHandler's active requests. Eventually QoS hits the max active requests and starts rejecting requests.
We have a very strong correlation between these errors and leaking connections/requests.
Other sites where we do not see these errors we do not see any leakage.
How to reproduce? I am afraid I have not been able to reproduce to make jetty trigger HttpParser.earlyEOF, not sure how to simulate a bad client :(
Stacktrace message: "Error while closing the output stream in order to commit response" stacktrace:
org.glassfish.jersey.server.ServerRuntime$Responder: org.eclipse.jetty.http.BadMessageException: 400: Early EOF
at o.e.j.s.internal.HttpConnection$RequestHandler.earlyEOF(HttpConnection.java:1051)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1729)
at org.eclipse.jetty.server.internal.HttpConnection.parseRequestBuffer(HttpConnection.java:568)
at org.eclipse.jetty.server.internal.HttpConnection.parseAndFillForContent(HttpConnection.java:493)
at o.e.j.s.i.HttpConnection$HttpStreamOverHTTP1.read(HttpConnection.java:1346)
at org.eclipse.jetty.server.HttpStream$Wrapper.read(HttpStream.java:161)
at o.e.j.s.internal.HttpChannelState$ChannelRequest.read(HttpChannelState.java:888)
at org.eclipse.jetty.server.Request$Wrapper.read(Request.java:791)
at o.e.j.server.handler.EventsHandler$EventsRequest.read(EventsHandler.java:376)
at org.eclipse.jetty.server.Request$Wrapper.read(Request.java:791)
at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.readChunk(AsyncContentProducer.java:319)
at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.produceChunk(AsyncContentProducer.java:305)
at o.eclipse.jetty.ee10.servlet.AsyncContentProducer.isReady(AsyncContentProducer.java:243)
at o.e.jetty.ee10.servlet.BlockingContentProducer.nextChunk(BlockingContentProducer.java:108)
at org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:244)
at org.eclipse.jetty.ee10.servlet.HttpInput.read(HttpInput.java:225)
at o.g.jersey.message.internal.EntityInputStream.read(EntityInputStream.java:79)
at o.g.j.m.i.ReaderInterceptorExecutor$UnCloseableInputStream.read(ReaderInterceptorExecutor.java:276)
at c.f.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:547)
at c.f.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:137)
at c.f.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:266)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1863)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1262)
at o.g.j.j.internal.jackson.jaxrs.base.ProviderBase._createParser(ProviderBase.java:843)
at o.g.j.j.internal.jackson.jaxrs.base.ProviderBase.readFrom(ProviderBase.java:787)
at o.g.j.m.i.ReaderInterceptorExecutor$TerminalReaderInterceptor.invokeReadFrom(ReaderInterceptorExecutor.java:235)
at o.g.j.m.i.ReaderInterceptorExecutor$TerminalReaderInterceptor.aroundReadFrom(ReaderInterceptorExecutor.java:214)
at o.g.j.message.internal.ReaderInterceptorExecutor.proceed(ReaderInterceptorExecutor.java:134)
at o.g.j.s.i.MappableExceptionWrapperInterceptor.aroundReadFrom(MappableExceptionWrapperInterceptor.java:49)
at o.g.j.message.internal.ReaderInterceptorExecutor.proceed(ReaderInterceptorExecutor.java:134)
at o.g.jersey.message.internal.MessageBodyFactory.readFrom(MessageBodyFactory.java:1072)
at o.g.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:657)
at org.glassfish.jersey.server.ContainerRequest.readEntity(ContainerRequest.java:290)
at o.g.j.s.i.i.EntityParamValueParamProvider$EntityValueSupplier.apply(EntityParamValueParamProvider.java:73)
at o.g.j.s.i.i.EntityParamValueParamProvider$EntityValueSupplier.apply(EntityParamValueParamProvider.java:56)
at o.g.j.s.spi.internal.ParamValueFactoryWithSource.apply(ParamValueFactoryWithSource.java:50)
at o.g.j.server.spi.internal.ParameterValueHelper.getParameterValues(ParameterValueHelper.java:68)
at o.g.j.s.m.i.JavaResourceMethodDispatcherProvider$AbstractMethodParamInvoker.getParamValues(JavaResourceMethodDispatcherProvider.java:109)
at o.g.j.s.m.i.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
at o.g.j.s.m.i.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
at o.g.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
at o.g.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
at o.g.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:263)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at o.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:242)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:312)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at o.e.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
at c.k.p.f.rest.filter.RequestContextBindingFilter.doFilter(RequestContextBindingFilter.java:63)
at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
at o.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
at o.e.j.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:819)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:575)
at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:851)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at org.eclipse.jetty.server.handler.ResourceHandler.handle(ResourceHandler.java:164)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at o.eclipse.jetty.server.handler.CrossOriginHandler.handle(CrossOriginHandler.java:375)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at o.eclipse.jetty.server.handler.ConditionalHandler.nextHandler(ConditionalHandler.java:421)
at org.eclipse.jetty.server.handler.QoSHandler.handleWithPermit(QoSHandler.java:266)
at org.eclipse.jetty.server.handler.QoSHandler.onConditionsMet(QoSHandler.java:191)
at o.eclipse.jetty.server.handler.ConditionalHandler.handle(ConditionalHandler.java:378)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at c.k.p.frontend.ratelimiter.PunterThrottlingHandler.handle(PunterThrottlingHandler.java:36)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at c.k.k.jetty.KelogsTracingHandler.handle(KelogsTracingHandler.java:53)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
at org.eclipse.jetty.server.handler.EventsHandler.handle(EventsHandler.java:81)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:597)
at org.eclipse.jetty.server.Server.handle(Server.java:179)
at o.e.j.s.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
at o.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at j.util.concurrent.ThreadPerTaskExecutor$TaskRunner.run(ThreadPerTaskExecutor.java:314)
at java.lang.VirtualThread.run(VirtualThread.java:309)
Just to add to what I found; from a heap dump I can see Eclipse MAT is (this log is from heap-hero that seems to be built on MAT)
2,971 instances of "org.eclipse.jetty.io.SocketChannelEndPoint", loaded by "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0x6ac006a40" occupy 361,225,312 (75.72%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Node[]", loaded by "
At the point of heap dump there is certainly not that many real active requests.
@LarsKrogJensen can you share the heap dump?
I don't think so as it contains various sensitive data
If I could figure out a way to reproduce earlyEOF I would in a separate standalone app it would ofc be fine.
How do you verify this scenario? any tooling
not sure sharing screenshots help
@LarsKrogJensen
If you can write a reproducer, will be great.
Otherwise, please detail exactly how you configured your application.
What Handler
s, in what order, their configuration, etc. so that we can try to reproduce the issue as closely as possible as your configuration.
I will for sure continue trying to create a reproducer, already spent many hours without success. I raised this issue cause I was hoping you would be able to spot something in the code that explains it.
@sbordet Do you have any unit/integration test that simulates the earlyEOF that I can learn from?
Application is using jetty and jersey to provide a rest api, otherwise it is framework-free™. Using virtual threads as described in https://webtide.com/jetty-12-virtual-threads-support/
Handler chain obtained from Server.dump():
GZipHandler StatisticsHandler CustomTracingHandler (simple Handler.Wrapper implementation) CrossOriginHandler CustomRateLimitHandler (simple Handler.Wrapper using Bucket4j to rate limit with redis as backend) QoSHandler (max requests 4k, max suspend 10sec) ServlerContextHandler ConstraintSecurityHandler (limits allowed methods) ServletHandler (running Jersey ServletContainer)
@sbordet I sanitized the heap dump with https://github.com/paypal/heap-dump-tool and got clearance to share it with you privately. Dump is 500M and ziped 50M.
@LarsKrogJensen email it to [email protected], let's see if it's not too big.
@LarsKrogJensen I wrote a test case with Jersey that tries to reproduce your issue.
I get a stack trace that is almost identical to yours, but everything works fine: the connection is closed on the server, and the server endPoint removed.
I analyzed the heap dump, and there are few things I noted:
- The idle timeout is 30 minutes. This is typically too long, and the reason those connections stay around for such a long time.
- Many of the Jetty objects that manage responses are in a state where they have not written any response.
- You are using virtual threads, and 3 of them are pinned into NIO
select()
. This is not good. - The other 4 virtual threads instances in the heap dump have no carrier, and they are blocked on some lock.
We suspect your problems have to do with virtual threads. We apparently cannot reproduce them because it's not your full system, and you may make calls to code that is not virtual thread safe (and pins the carriers).
That would explain why the QoSHandler
would not decrement the permits: the virtual thread entered to serve the request, got blocked and was never resumed (which also explains why the Jetty output objects have not written any response).
Considering that 3 carriers are pinned, and you seem to have configured a parallelism of 4, you have only 1 carrier left (if any at all -- maybe we just did not find where it is pinned).
If you want to go to the bottom of this issue, you should take a thread dump when your system is again in a state similar to the one captured with the heap dump, and let's see what it shows. Then it's going to be a chase to understand what virtual threads are doing, as explained in the Observing virtual threads
section of JEP 444: set jdk.trackAllThreads=true
and use jcmd <pid> Thread.dump_to_file -format=json <file>
.
Otherwise, I suggest that you disable virtual threads, and try your system again, and see if you have the same problems.
Let us know. We are very interested in knowing interactions of Jetty with virtual threads, which sometimes are very unexpected.
Thanks for thorough explanations, I will continue to try to create a standalone reproducer on my side.
VThread pinning is indeed a pita, have been monitoring it with jfr recordings to examine any > 20ms pinnings and found nothing. So I am a bit surprised of this finding and need to dig deeper.
I did a jcmd thread dump under similar scenario, but not at same time as the head dump. Will send it to you.
Will keep this issue open for a few more days so that I can do a bit more investigations.
Forgot to mention, we been running with jetty 12 and virtual threads for a couple of months now and it has been performing really, really well, even under really heavy load.
we been running with jetty 12 and virtual threads for a couple of months now and it has been performing really, really well, even under really heavy load
Do you have a comparison with platform threads? Also platform threads run really, really well?
Can you detail what do you mean by "well"? Less latency, more throughput, less memory used, less CPU, etc? What exactly is better with virtual threads?
I don't have any direct comparison with platform threads, the feedback here is that jetty works really, really well with virtual threads. One observation is that total number of jvm threads decreased a lot, i.e. much less context switching for cpu. I know from your blogs that you have your doubts of v-thread benefits.
Thanks for the feedback. If you have more in the future, we are very interested in hearing Jetty user's experience with virtual threads.
I analyzed the thread dump, and unfortunately it does not have much information.
For example, VirtualThread.toString()
prints useful information about thread state and carrier, but it's not used in the dump 😠
Also, there is no locking information, so it is not possible to understand if a virtual thread is pinned by a call to a library.
Surely the virtual threads that call Kafka are pinned in EPoll.wait()
-- we know that for sure.
You must definitely avoid doing that.
The thread dump shows 23 virtual threads parked, mostly in CompletableFuture.get()
, often due to HttpClient
calls you are waiting on.
This would be completely normal, unless something in the stack wraps the next call into a synchronized
block that would pin the carrier -- unfortunately we do not know.
What are the conditions you took this thread dump?
Was the QoSHandler
maxed out at 4k already?
Could be that this virtual thread analysis is a red herring and not the actual problem.
However, the heap dump evidence seems to indicate that many responses are not even written, which hints at threads blocked, of which there is evidence in the thread dump (although only 23).
Would you be able to write a "timeout" Handler
that starts a timer when a thread enters, and cancels it when the thread exit? With a short timeout (say 5 seconds) that fires if the request is not handled in a timely fashion?
If the timeout triggers, then it's a thread blocked problem. Otherwise, it's something else and we need to dig deeper.
I would still also try without virtual threads and see how things go.
Was the QoSHandler maxed out at 4k already?
No it was not at any of the dumps
A timer is an interesting idea, was looking for if there are any handlers out-of-the-box that could terminate request not responded with in certian time, closes match was a DosFilter that does a lot of other things as well.
Will explore replacing virtual threads when running locally on my develop laptop (12 core) and see if it makes any difference. Now I can reproduce it every time I try with my reproducer, there is obviously no load on at all.
So far no luck in standalone reproducer though
Sharing that reproducer if anyone would read this in the future:
public void testEarlyEOF() throws Exception {
final String body = """
{
"a":"b",
}
""";
try (Socket socket = new Socket("localhost", 15030)) {
var eoln = System.lineSeparator();
var header = BufferUtil.toBuffer(
"POST /hello HTTP/1.1" + eoln +
"Content-Type: application/json" + eoln +
"Content-Length: 170" + eoln +
"Host: localhost" + eoln);
var bdy = BufferUtil.toBuffer(eoln + body);
socket.getOutputStream().write(header.array());
for (byte b : bdy.array()) {
socket.getOutputStream().write(b);
socket.shutdownOutput();
break;
}
HttpTester.Input input = HttpTester.from(socket.getInputStream());
HttpTester.Response response = HttpTester.parseResponse(input);
System.err.printf("%s %s %s%n", response.getVersion(), response.getStatus(), response.getReason());
for (HttpField field : response) {
System.err.printf("%s: %s%n", field.getName(), field.getValue());
}
System.err.printf("%n%s%n", response.getContent());
}
}
@LarsKrogJensen ok the ball is in your court then.
Let us know if you can reproduce the issue, or if you have more information, or if you have more evidence after writing the "timeout" Handler
, or if you switched back to platform threads and everything works fine, and we'll happily discuss.
Yep, I have the ball.
Tried with platform threads and that did not help (in local dev setup).
To mitigate issues in production I added AWS WAF to our ALB and that seems to handle misbehaving clients well so now I no longer see an increase of active requests (and log errors are gone) Now I can get some night sleep again.
Still working on a standalone reproducer, but not able to fully replicate.
I will reach out when I found more, thanks for outstanding help.
Tried with platform threads and that did not help (in local dev setup).
What does that mean exactly? You still have the leak and QosHandler maxes out?
If so, I think your best bet (pun intended!) is now the "timeout" Handler
.
I saw many CompletableFuture.get()
in your thread dump, which may wait forever.
Note that even if the there is a shorter idle timeout for connections, for HTTP requests that are processed in async mode (via request.startAsync()
or equivalently using jax-rs @Suspended
) the idle timeout will be ignored as per Servlet specification.
So if your processing is to call some external service via HttpClient
, and wait for the response with the infinite CompletableFuture.get()
, and that service does not respond, or there is a bug and the CompletableFuture
is not completed, then you are going to wait forever.
But this case should be easily detected by the "timeout" Handler
.
What does that mean exactly? You still have the leak and QosHandler maxes out?
In local dev setup I start app and send single 'bad' request and it triggers the issue, every time. Stacktrace is logged but request never completes, i.e. client simply waits. So the QoSHandler is not maxed out, and I have fixed the thread pinning and set connector idle timeout to 30 seconds.
In standalone reproducer I can get almost identical error stack trace, but then jetty properly return an error response so I am trying to figure out the differences.
The request never hits the resource method.
@sbordet Now I have a reproducer and preparing sharing in a GitHub repo
Change that made the difference is that I added a jersey exception mapper handling any exception:
public static class CatchAllExceptionMapper implements ExceptionMapper<Exception> {
@Override
public Response toResponse(Exception ex) {
return Response.status(SERVICE_UNAVAILABLE)
.entity("service unavailable")
.type(MediaType.TEXT_PLAIN).build();
}
}
@sbordet Here is a link to GitHub repo https://github.com/LarsKrogJensen/jetty-issue-11679
Start JettyApp then run Client and you should see that Client hangs while a stack trace is logged in JettyApp
@sbordet let me know if you had a chance to reproduce with provided repo
@LarsKrogJensen I went ahead and cleaned up your test project a bit.
https://github.com/joakime/jetty-issue-11679
See https://github.com/joakime/jetty-issue-11679/blob/main/src/test/java/BehaviorTests.java
The setup of your raw request setup isn't 100% correct - the Content-Length and body setup are not technically correct (that length is number of bytes, not length of string, eol on HTTP/1 is always "\r\n", and you can use new java text blocks to accomplish that). Also, if you look around commit https://github.com/joakime/jetty-issue-11679/commit/6eb9426d65511d5578ffa4c18e8000dd5f26493c you'll see other small improvements (like how to dump the HttpTester.Response details) that you might benefit from in the future.
And I parameterized the different ways to send the request in the unit test.
ok, looks like a nice test improvement and partial_write seems to still fail
From the server point of view, that "early EOF" is 100% correct, the input was closed (see shutdownInput SocketChannelEndPoint
in logging output), and the request body was incomplete (per Content-Length specified), so it has to throw an "early EOF".
Then this is seen by the Jersey's implementation of a Reader that is providing content to the Jackson/json layer.
The Jersey ObjectReader sees this and reports it as an IOException to Jackson/json's parse attempt.
The error is then attempted to be processed by Jersey for producing an error response.
However, that error response cannot be generated / committed, so Jersey produces from org.glassfish.jersey.server.ServerRuntime$Responder
saying it "Error while closing the output stream in order to commit response"
Now, as to the reported connection leak, that's still unresearched.
I pushed a few more updates into that test project fork. Mostly adding idle timeouts.
I can confirm that the PARTIAL_WRITE_CLOSE_OUTPUT does show a leaked connection on the Server side.
| += ServerConnectorManager@1e530163[keys=1] - STARTED
| | += oeji.ManagedSelector@4088741b{STARTED}[id=0 keys=1 selected=0 updates=0 selection:tot=1/avg=1.00/max=1] - STARTED
| | += AdaptiveExecutionStrategy@54bca971/SelectorProducer@23706db8/PRODUCING/p=0/QueuedThreadPool[qtp502848122]@1df8da7a{STARTED,8<=8<=200,i=6,r=-1,t=54558ms,q=0}[NO_TRY][pc=0,pic=0,pec=1,epc=0]@2024-04-26T08:53:48.936745618-05:00 - STARTED
| | | +- SelectorProducer@23706db8
| | | +~ QueuedThreadPool[qtp502848122]@1df8da7a{STARTED,8<=8<=200,i=6,r=-1,t=54557ms,q=0}[NO_TRY] - STARTED
| | | +- java.util.concurrent.ThreadPerTaskExecutor@205bed61
| | +> updates @ 2024-04-26T08:53:48.93434834-05:00 size=0
| | +> keys @ 2024-04-26T08:53:48.935257164-05:00 size=1
| | +> SelectionKey@4f994ef{i=0}->SocketChannelEndPoint@5fd6dce1[{l=/192.168.1.215:41503,r=/192.168.1.215:48294,ISHUT,fill=-,flush=-,to=1966/3000}{io=0/0,kio=0,kro=1}]->[HttpConnection@3fd39741[p=HttpParser{s=CLOSED,85 of 170},g=HttpGenerator@a40d6c5{s=START}]=>HttpChannelState@77741508{handling=null, handled=true, send=SENDING, completed=false, request=POST@a989d992 http://192.168.1.215:41503/api/hello HTTP/1.1}]
| +- sun.nio.ch.ServerSocketChannelImpl[/[0:0:0:0:0:0:0:0]:41503]
Interestingly, it seems to no longer be updating it's Idle Timeout, like it's stuck at to=1966/3000
Aha, so I might also face a Jersey issue :(
The setup of your raw request setup isn't 100% correct - the Content-Length and body setup are not technically correct
Yeah, aware it's not fully correct, but that's kind of part of the scenario to simulate misbehaving client; it might be evil or simply a case of unreliable network.
@LarsKrogJensen we analyzed the issue.
At the code, it is a Jersey issue.
What happens is that Jersey tries to write the response in ServerRuntime.Responder.writeResponse()
, which calls response.close()
(about line 750), which calls ContainerResponse.close()
, which is:
public void close() {
if (!closed) {
closed = true;
messageContext.close();
requestContext.getResponseWriter().commit();
requestContext.setWorkers(null);
}
}
messageContext.close()
is OutboundMessageContext.close(), which delegates the close to the entity stream, which eventually wraps Jetty's
HttpOutput`.
Due to the fact that we have received an early EOF, trying to write to the Jetty's HttpOutput
fails and rethrows the BadMessageException
produced by the early EOF.
The Jersey code catches IOException
, but BadMessageException
is a RuntimeException
that therefore escapes the close()
method, and (see above) does not perform the subsequent 2 calls, in particular it does not perform requestContext.getResponseWriter().commit()
.
The escaped exception is then logged but nothing more is done by Jersey; instead, it should have completed the AsyncContext
to complete the response.
Failing to complete the AsyncContext
causes the response processing to never complete, and idle timeouts are ignored as per Servlet specification.
Since the processing is not complete, the QoSHandler
does not see that request as completed, so it does not release the permit, eventually causing the build up and leak of requests.
To be fair, we could produce a subclass of IOException
instead of BadMessageException
in case of early EOF, and that would fix the issue.
We are working on that, but the Jersey bug is there anyway, as RuntimeException
can be thrown by close()
, so Jersey should catch(Throwable x)
and ensure that the AsyncContext
is completed.
@LarsKrogJensen would you like to open a Jersey issue and point it to this discussion? Otherwise we'll do it.