armeria icon indicating copy to clipboard operation
armeria copied to clipboard

Potential bug: InboundTrafficController applies back-pressure at connection level only, ignoring per-stream windows

Open yzfeng2020 opened this issue 7 months ago • 5 comments

Description

We’ve been investigating an intermittent deadlock where one HTTP/2 stream starves the others under Armeria’s client. It appears that Armeria’s InboundTrafficController suspends inbound channel once its tracking value exceeds the connection-level high-water mark even it's consumed by one DecodedHttpResponse, and does not respect the per-stream window limits.

Setup (similar repo at https://github.com/line/armeria/pull/6258):

  • Client Connection-level initial window = 1 MB

  • Client Stream-level initial window = 1 KB

  • Server endpoint is serving with writing ~1MB data upon request arriving

  • Two GETs on the same connection issued by Armeria WebClient:

    • One is issued slightly earlier than the other
    • The HttpResponse from the client is adopted with res.toInputStream(Identity) to transform into InputStream, but the application layer doesn't read from the InputStream explicitly, e.g. the test doesn't read from the responses explicitly.

Observation

  • The first stream is able to receive 1MB data even at the conditions of
      1. the application layer never reads from the InputStream explicitly and
      1. the stream level window is 1KB

while the second stream couldn't make progress at all.

What we have confirmed

  • Netty H2 flow control doesn't kick in

    • Both H2 connection & per-stream windows remain open, shown in heap dumps as non-zero credit.
  • Armeria’s InboundTrafficController is suspended

    • it hits the high water mark.

What we think happens

  • When first request is sent from armeria client and handled by the server, the server keeps writing data to the stream to 1MB; during that time stream 2 hasn't arrived yet.

  • H2 level flow control doesn’t really take effect for first stream, as Armeria buffers the HttpData in the DecodedHttpResponse, and tells Netty and it has consumes all the bytes

    • The release of H2 flow control window is via onDataRead callback of Http2ResponseDecoder that implements Http2FrameListener.
    • https://github.com/netty/netty/blob/1001008e7599db4a1efcde2fb629f0613f029d1e/codec-http2/src/main/java/io/netty/handler/codec/http2/Http2FrameListener.java#L25-L40
    • since res.tryWrite returns almost immediately, it doesn't track the application consumption
  • The read from the channel stops when first stream hits the InboundTrafficController’s highwatermark, which is exactly the connection level window

  • Hence, the second request, sharing the same connection can never make progress

Expected behavior

We would expect Armeria to:

  • Track buffered bytes per stream, based on each stream’s own window, in addition to connection level window

  • Suspend reading for stream whose buffer has exceeded its stream limit to allow other streams to continue until their own limits are reached.

  • Does not consume additional data until application layer has

Related issues

https://github.com/line/armeria/issues/1040 https://github.com/line/armeria/issues/1487

yzfeng2020 avatar May 25 '25 07:05 yzfeng2020

Thanks for the detailed report! I'm curious if the problem goes away if you remove the capping logic here.

Also, do you have a reproducer we can play with?

trustin avatar May 26 '25 02:05 trustin

@trustin I pushed https://github.com/line/armeria/pull/6258 as a similar repro, ptal, thanks!

yzfeng2020 avatar May 27 '25 23:05 yzfeng2020

InboundTrafficController operates at a connection level, so an unconsumed stream can block the channel and prevent another stream from reading. This form of backpressure may violate the HTTP/2 flow control specification.

A flow-control scheme ensures that streams on the same connection do not destructively interfere with each other.

To make flow control work properly in HTTP/2, I'm going to modify the code to use Http2LocalFlowController.consumeBytes(Http2Stream,int)

ikhoon avatar May 28 '25 13:05 ikhoon

@ikhoon thanks, may I know what's the effort level for this? I am happy to help if the team doesn't have capacity for this issue.

yzfeng2020 avatar May 29 '25 05:05 yzfeng2020

It doesn't seem like fixing the problem requires significant effort. I already shared the direction of the change with the team and received their agreement. I will open a PR within the next few working days.

ikhoon avatar May 29 '25 05:05 ikhoon