akka-http
akka-http copied to clipboard
Finished streams lingering in `HalfClosedRemoteSendingData` state leading to `REFUSED_STREAM`
We might be hitting an issue in akka-http (10.2.7 and 10.2.9) when using akka-grpc: on the server side, some requests seem to never fully finish and all new requests are being refused (REFUSED_STREAM
because activeStreamCount() > settings.maxConcurrentStreams
).
All we have is a heap dump from a time where the issue was likely. Analysing the heap dump, we notice that all streams in Http2StreamHandling#streamStates
are in state HalfClosedRemoteSendingData
, and their respective outStream
seem to indicate that everything is finished (buffer
is empty, upstreamClosed
is true, endStreamSent
is true, trailer
is set, maybeinlet
is null).
Looking at the code I would expect those streams to have transitioned to Closed
state and removed from the streamStates
map, they are not and I don't understand why.
We didn't manage to reproduce in isolation, and I doubt we can afford to run in debug mode for days until the issues pops up again. What would you advise we do to understand the issue better?
Is it possible to go through the following states on the server side: Idle -> OpenReceivingDataFirst -> Open -> HalfClosedRemoteSendingData
? If yes, can it happen that we land in HalfClosedRemoteSendingData
with outStream
finishing at the same time but the proper transition to Closed
doesn't happen?
Thanks for the great report, @fredfp. I'll have a look.
My pleasure, thanks for having a look! As mentioned on gitter, I can't share the heap dump, but I'd be happy to do some screensharing or further data extraction if that could help. Just let me know.
What kind of gRPC requests are involved in that case? Are they streaming or request/response style?
This kind of call stack could be a problem where we the state machine is entered twice on the same call stack so that the inner state change is lost.
We mostly have request/response style, and a single reqeust with streaming response. From the heap-dump I assumed the issue was with a request/response style given that maybeinlet
had not been set (and is never cleared in the code).
Thanks!
The previous call stack would not be a problem as a gRPC response usually has multiple frames so that pullNextFrame
would not even want to close stream immediately.
Next conjecture to test: unexpected timing of incoming WINDOW_UPDATE frames using this call stack:
@fredfp do you know how big your response data is? I can reproduce the issue when the response is bigger than the INITIAL_WINDOW_SIZE. What's your client and does it have any extra config on the WINDOW size settings?
Our client is akka-grpc, going through a couple of linkerd proxies. We don't touch any setting related to window size. Our responses are of varying sizes, and I'm pretty sure they can be bigger than 64ko (which IIRC is the default initial window size).