async-http-client
async-http-client copied to clipboard
Duplicate handler name: request-body-streamer
Discussed in #1627, but spinning off as it's a separate issue.
Under load, when using the NettyReactiveStreamsBody
, I can reliably reproduce the following error:
[info] java.lang.IllegalArgumentException: Duplicate handler name: request-body-streamer
[info] at io.netty.channel.DefaultChannelPipeline.checkDuplicateName(DefaultChannelPipeline.java:1066)
[info] at io.netty.channel.DefaultChannelPipeline.filterName(DefaultChannelPipeline.java:284)
[info] at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:204)
[info] at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:195)
[info] at org.asynchttpclient.netty.request.body.NettyReactiveStreamsBody.write(NettyReactiveStreamsBody.java:59)
[info] at org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:426)
[info] at org.asynchttpclient.netty.request.NettyRequestSender.sendRequestWithOpenChannel(NettyRequestSender.java:259)
[info] at org.asynchttpclient.netty.request.NettyRequestSender.sendRequestWithCertainForceConnect(NettyRequestSender.java:141)
[info] at org.asynchttpclient.netty.request.NettyRequestSender.sendRequest(NettyRequestSender.java:113)
[info] at org.asynchttpclient.DefaultAsyncHttpClient.execute(DefaultAsyncHttpClient.java:241)
[info] at org.asynchttpclient.DefaultAsyncHttpClient.executeRequest(DefaultAsyncHttpClient.java:210)
I have a Scala reproduction with http4s. It can be seen with sbt run
.
The problem is that that AHC offers the channel back to the pool manager as soon as the last response chunk is read, which causes a race condition against the NettySubscriber
's completion. If removeFromPipeline
is not called before the channel is offered back to the pool and polled out to another reactive streams request, we hit the exception.
A workaround is to remove the pipeline in a hook in AsyncHandler
:
override def onConnectionOffer(channel: Channel) {
try channel.pipeline.remove("request-body-streamer")
catch {
case nse: NoSuchElementException => // this is just a cleanup of last resort
}
}
This works, but is bad for a couple reasons:
- It runs regardless of the body type of the request.
- The name of the pipeline is a private detail of the
NettySubscriber
.
Alternatively, NettyReactiveStreamsBody
could remove the existing one if the addLast
call fails, and then retry adding a subscriber. This isn't the cleanest solution, but at least keeps it to one class. I'm going with the onConnectionOffer
hack for now, but can work on a PR for this alternative solution unless there's a better hook that I'm missing.
Honestly, reactive streams support was contributed by people from Lightbend, but there's been nobody on the line since then. Please contribute if you can.
My solution has revealed a new problem:
[error] java.lang.IllegalStateException: unexpected message type: DefaultHttpRequest, state: 1 (HttpObjectEncoder.java:86)
[error] io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:86)
[error] io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
[error] io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
[error] io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:348)
[error] io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
[error] io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
[error] io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
[error] io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
[error] io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:324)
[error] io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:135)
[error] io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
[error] io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
[error] io.netty.channel.AbstractChannelHandlerContext.access$1500(AbstractChannelHandlerContext.java:38)
[error] io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1175)
[error] io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1098)
[error] io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
[error] io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
[error] io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
[error] io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
I'm afraid that my fix is preventing it from writing the LastHttpContent.EMPTY_LAST_CONTENT
in the subscriber's complete()
handler. I think we are recycling the channel based on receiving the last chunk of response, but also need to await the end of the request. Does that sound plausible?
I'm seeing a similar stack trace while using http4s with the - very frequently on one of six test networks (maybe 1 in 10), but rarely everywhere else - including production networks (maybe 1 in 10000 - hard to measure), deep in the bowls of Netty, when using http4s' AsyncHttpClient client :
Caused by: java.lang.IllegalStateException: unexpected message type: DefaultHttpRequest, state: 1
at io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:86)
at io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
I have only observed incidents of the problem when sending a request with a body, just after sending (and receiving the result) of a request with no body. I think that's my best clue, as the code in the netty library is line 86 in https://github.com/netty/netty/blob/00afb19d7a37de21b35ce4f6cb3fa7f74809f2ab/codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java :
if (state != ST_INIT) {
throw new IllegalStateException("unexpected message type: " + StringUtil.simpleClassName(msg)
+ ", state: " + state);
}
ST_INIT is from this family of constants:
private static final int ST_INIT = 0;
private static final int ST_CONTENT_NON_CHUNK = 1;
private static final int ST_CONTENT_CHUNK = 2;
private static final int ST_CONTENT_ALWAYS_EMPTY = 3;
1 is ST_CONTENT_NON_CHUNK . I interpret that as "the encoder really needs to be in ST_INIT" or "ST_CONTENT_NON_CHUNK" should be OK but this code prevents the rest of the code from using it."
I've made changes in our code base to wait a few seconds, then try the http request again. That mitigates the problem, but doesn't fix it, and doesn't explain the mystery on that one test system. My opening guess is netty is trying to reuse some object from the previous interaction, and that object is in the wrong state.
I think this ticket is related since Ross wrote so much of the http4s code.
Is there something better than "try again in a few seconds" to mitigate the problem?
There's a hint of something similar here: https://github.com/netty/netty/issues/7993 that I was able to follow to https://github.com/reactor/reactor-netty/commit/13fbcdac6b8cef66272eac691a63db68102b3e43 , but I'm a bit out of my depth here.
I'm also observing this under load with
- http4s 0.21.7
- async-http-client 2.10.5
- fs2 2.4.4
Is this benign or could it cause re-sending / receiving data that has been sent / received before?
Becky - I haven't observed that, but haven't chased it either. Anecdotally I've not observed a request recovering on its own, but haven't looked for that.
(Our data model handles 0-or-more gracefully (0 is problematic, 2 or more is wasteful but otherwise fine). We'd have to hunt for that aspect problem to observe it. We work around the bug by observing the exception, waiting 5 seconds, then attempting an http identical request. I haven't observed needing more than 3 attempts.)
Not benign - "bounded" maybe. Incidents of the bug tend to clump together. I believe it's a systemic problem: when the system is under pressure it happens frequently. When resources are available we rarely see the problem.
We've faced that issue too and I believe I have a fix for that here. @rossabaker Your reproducer is not there anymore, can you check that commit please?
Sorry, I cleaned up my GitHub a while back, and don't have that repo anymore.
I've seen this issue today @Squiry - is your fix still working out well for you? Should we raise a PR?
This fix does work in my case. I will create a PR, but I'm not sure about state of this project tbh.