jetty.project icon indicating copy to clipboard operation
jetty.project copied to clipboard

incorrect http 200 response while shutting down jetty

Open BertRobben opened this issue 2 years ago • 7 comments

Jetty version: 9.4.46.v20220331 Java version/vendor: openjdk version "1.8.0_332" OS type/version: CentOs 7

We are using the graceful shutdown feature (in other words, we configured our Jetty server with the StatisticsHandler as advised in the aftermath of [https://github.com/eclipse/jetty.project/issues/8210] and noticed that a small number of requests are now getting an incorrect HTTP 200 response. These requests enter the server just after we asked the server to stop (by calling Server#stop). The vast majority of requests that enter the server after we called stop are terminated with a proper 503. However, sometimes, one or two requests get this 200 response. The response is incorrect because it’s NOT delivered by our REST resources (they return HTTP 204, but never HTTP 200), and even worse, our server-side logic for this particular request was NOT executed. Before we started using StatisticsHandler we never got this.

I did some digging and managed to produce(*) a stack trace that shows that the response seems to be written because the statistics handler flushes the response buffer.

I think this issue is caused by a race condition where the server is not yet shut down in the beginning of StatisticsHandler.handle, but by the time execution reaches the end of this method, the server is shutting down.

at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:998) at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:1086) at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285) at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:269) at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:751) at org.eclipse.jetty.server.Response.flushBuffer(Response.java:1112) at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:208) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) at java.lang.Thread.run(Thread.java:748)

Debug logs are attached for a problematic request/response.

Bert

(*) For clarity, no exceptions are thrown. I just dumped the current stack trace at that point. jetty-logs.txt

BertRobben avatar Jul 07 '22 14:07 BertRobben

Initial impression is that it looks like a concurrency issue (race?). I wonder what the Request._timestamp is for those requests? Were they processed before or after the graceful shutdown occurred?

joakime avatar Jul 07 '22 16:07 joakime

The order of events I see (see also attached logs) is:

(1) we call server.stop() (2) first debug log of request is generated by jetty server (3) server is waiting for other ongoing requests (that started before we call server.stop()) to shut down (4) incorrect response is sent / other requests come in and are rejected (503) (5) ongoing requests are all finished (6) jetty shuts down

If the issue happens, then it's always very very short after we call server.stop(). So that's consistent with the request first seeing nothing about shutdown, then at the end seeing shut down is in progress.

I don't know what Request._timestamp is and where one can see it.

BertRobben avatar Jul 08 '22 05:07 BertRobben

In attachment you can find a simple unit test that shows the issue.

jetty-test.zip

BertRobben avatar Jul 11 '22 10:07 BertRobben

Uploaded slightly wrong file. Here's a better one: jetty-unit-test.zip

BertRobben avatar Jul 12 '22 13:07 BertRobben

FYI, from the code I can see that this is a 9.4.x issue only. In 10.x it is fixed.

BertRobben avatar Jul 12 '22 13:07 BertRobben

@BertRobben jetty-9.4 is at end of community support, so we're really just addressing security issues for it. If it's fixed in jetty-10.0.x and above, then I encourage you to upgrade. If that's not an option for you, and this is a commercial imperative for you, then you have the option to talk to Webtide about commercial support for jetty-9.4.x.

janbartel avatar Aug 01 '22 04:08 janbartel

Thanks for the advice. It is indeed already fixed in 10.x. We have upgraded to 10.x already, but our older branches which we still need to support run jetty 9 and those cannot be upgraded.

I addressed the issue in the older branches myself by making a custom extension to the statisticshandler that overrides the problematic method to fix the issue. So we're good now.

Due to company policy, i cannot contribute a patch :-(.

Bert

On Mon, 1 Aug 2022, 06:59 Jan Bartel, @.***> wrote:

@BertRobben https://github.com/BertRobben jetty-9.4 is at end of community support, so we're really just addressing security issues for it. If it's fixed in jetty-10.0.x and above, then I encourage you to upgrade. If that's not an option for you, and this is a commercial imperative for you, then you have the option to talk to Webtide about commercial support for jetty-9.4.x.

— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/8271#issuecomment-1200707171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATJ6ZUNN5OOJNW6E2APWQ3VW5KTJANCNFSM525VS4CQ . You are receiving this because you were mentioned.Message ID: @.***>

BertRobben avatar Aug 01 '22 07:08 BertRobben

The order of events I see is:

(1) we call server.stop() (2) first debug log of request is generated by jetty server (3) server is waiting for other ongoing requests (that started before we call server.stop()) to shut down (4) incorrect response is sent / other requests come in and are rejected (503) (5) ongoing requests are all finished (6) jetty shuts down

If the issue happens, then it's always very very short after we call server.stop(). So that's consistent with the request first seeing nothing about shutdown, then at the end seeing shut down is in progress.

On Thu, Jul 7, 2022 at 6:59 PM Joakim Erdfelt @.***> wrote:

Initial impression is that it looks like a concurrency issue (race?). I wonder what the Request._timestamp is for those requests? Were they processed before or after the graceful shutdown occurred?

— Reply to this email directly, view it on GitHub https://github.com/eclipse/jetty.project/issues/8271#issuecomment-1177933892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATJ6ZXGJJKDA2S3V4KONTLVS4EHVANCNFSM525VS4CQ . You are receiving this because you authored the thread.Message ID: @.***>

BertRobben avatar Oct 11 '22 08:10 BertRobben

Jetty 9.4.x is now at End of Community Support.

  • https://github.com/eclipse/jetty.project/issues/7958

joakime avatar Oct 11 '22 10:10 joakime

Thanks for the extra information. It indeed looks like a race condition that allows the request(s) past the stats handler, but then later decides not handle them, only for the stats handler to commit a 200 response (which is definitely wrong).

However, since 9.4 is end of community support, we will not be fixing this issue in 9.4 unless there is a commercial support request. So I'll close the issue now, but please feel free to add comments if you analyse more, as they may be useful to others.

gregw avatar Oct 11 '22 21:10 gregw