jetty.project
jetty.project copied to clipboard
incorrect http 200 response while shutting down jetty
Jetty version: 9.4.46.v20220331 Java version/vendor: openjdk version "1.8.0_332" OS type/version: CentOs 7
We are using the graceful shutdown feature (in other words, we configured our Jetty server with the StatisticsHandler as advised in the aftermath of [https://github.com/eclipse/jetty.project/issues/8210] and noticed that a small number of requests are now getting an incorrect HTTP 200 response. These requests enter the server just after we asked the server to stop (by calling Server#stop). The vast majority of requests that enter the server after we called stop are terminated with a proper 503. However, sometimes, one or two requests get this 200 response. The response is incorrect because it’s NOT delivered by our REST resources (they return HTTP 204, but never HTTP 200), and even worse, our server-side logic for this particular request was NOT executed. Before we started using StatisticsHandler we never got this.
I did some digging and managed to produce(*) a stack trace that shows that the response seems to be written because the statistics handler flushes the response buffer.
I think this issue is caused by a race condition where the server is not yet shut down in the beginning of StatisticsHandler.handle, but by the time execution reaches the end of this method, the server is shutting down.
at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:998) at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:1086) at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285) at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:269) at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:751) at org.eclipse.jetty.server.Response.flushBuffer(Response.java:1112) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:208) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) at java.lang.Thread.run(Thread.java:748)
Debug logs are attached for a problematic request/response.
Bert
(*) For clarity, no exceptions are thrown. I just dumped the current stack trace at that point. jetty-logs.txt
Initial impression is that it looks like a concurrency issue (race?).
I wonder what the Request._timestamp
is for those requests?
Were they processed before or after the graceful shutdown occurred?
The order of events I see (see also attached logs) is:
(1) we call server.stop() (2) first debug log of request is generated by jetty server (3) server is waiting for other ongoing requests (that started before we call server.stop()) to shut down (4) incorrect response is sent / other requests come in and are rejected (503) (5) ongoing requests are all finished (6) jetty shuts down
If the issue happens, then it's always very very short after we call server.stop(). So that's consistent with the request first seeing nothing about shutdown, then at the end seeing shut down is in progress.
I don't know what Request._timestamp is and where one can see it.
Uploaded slightly wrong file. Here's a better one: jetty-unit-test.zip
FYI, from the code I can see that this is a 9.4.x issue only. In 10.x it is fixed.
@BertRobben jetty-9.4 is at end of community support, so we're really just addressing security issues for it. If it's fixed in jetty-10.0.x and above, then I encourage you to upgrade. If that's not an option for you, and this is a commercial imperative for you, then you have the option to talk to Webtide about commercial support for jetty-9.4.x.
Thanks for the advice. It is indeed already fixed in 10.x. We have upgraded to 10.x already, but our older branches which we still need to support run jetty 9 and those cannot be upgraded.
I addressed the issue in the older branches myself by making a custom extension to the statisticshandler that overrides the problematic method to fix the issue. So we're good now.
Due to company policy, i cannot contribute a patch :-(.
Bert
On Mon, 1 Aug 2022, 06:59 Jan Bartel, @.***> wrote:
@BertRobben https://github.com/BertRobben jetty-9.4 is at end of community support, so we're really just addressing security issues for it. If it's fixed in jetty-10.0.x and above, then I encourage you to upgrade. If that's not an option for you, and this is a commercial imperative for you, then you have the option to talk to Webtide about commercial support for jetty-9.4.x.
— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/8271#issuecomment-1200707171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATJ6ZUNN5OOJNW6E2APWQ3VW5KTJANCNFSM525VS4CQ . You are receiving this because you were mentioned.Message ID: @.***>
The order of events I see is:
(1) we call server.stop() (2) first debug log of request is generated by jetty server (3) server is waiting for other ongoing requests (that started before we call server.stop()) to shut down (4) incorrect response is sent / other requests come in and are rejected (503) (5) ongoing requests are all finished (6) jetty shuts down
If the issue happens, then it's always very very short after we call server.stop(). So that's consistent with the request first seeing nothing about shutdown, then at the end seeing shut down is in progress.
On Thu, Jul 7, 2022 at 6:59 PM Joakim Erdfelt @.***> wrote:
Initial impression is that it looks like a concurrency issue (race?). I wonder what the Request._timestamp is for those requests? Were they processed before or after the graceful shutdown occurred?
— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/8271#issuecomment-1177933892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATJ6ZXGJJKDA2S3V4KONTLVS4EHVANCNFSM525VS4CQ . You are receiving this because you authored the thread.Message ID: @.***>
Jetty 9.4.x is now at End of Community Support.
- https://github.com/eclipse/jetty.project/issues/7958
Thanks for the extra information. It indeed looks like a race condition that allows the request(s) past the stats handler, but then later decides not handle them, only for the stats handler to commit a 200 response (which is definitely wrong).
However, since 9.4 is end of community support, we will not be fixing this issue in 9.4 unless there is a commercial support request. So I'll close the issue now, but please feel free to add comments if you analyse more, as they may be useful to others.