trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Canceled streams are not cleaned up in time

Open mrdg opened this issue 3 years ago • 2 comments

We run a proxy server in front of ATS and noticed this problem when the number of streams on a connections gets close to the maximum, and some streams are canceled by the proxy (generally because the original request was canceled).

I posted some test code and logs of a reproduction case in this gist. I tested this on master with proxy.config.http2.max_concurrent_streams_in set to 5 to make it easier to parse the logs, but we see the issue with much higher limits as well. The test client sends 5 requests, then cancels those, and then sends 5 more, which should be accepted, but it appears that ATS is refusing them. The client code logs this:

2022/11/04 12:38:26 http2: Transport received RST_STREAM stream=11 len=4 ErrCode=REFUSED_STREAM
2022/11/04 12:38:26 http2: Transport received RST_STREAM stream=13 len=4 ErrCode=REFUSED_STREAM
2022/11/04 12:38:26 http2: Transport received RST_STREAM stream=15 len=4 ErrCode=REFUSED_STREAM
2022/11/04 12:38:26 http2: Transport received RST_STREAM stream=17 len=4 ErrCode=REFUSED_STREAM
2022/11/04 12:38:26 http2: Transport received RST_STREAM stream=19 len=4 ErrCode=REFUSED_STREAM

Looking at the ATS logs, I see that the previously canceled streams are deleted after the second batch of requests is rejected:

[Nov  4 12:38:26.285] [ET_NET 1] DEBUG: <Http2CommonSession.cc:250 (do_start_frame_read)> (http2_cs) [1] receiving frame header
[Nov  4 12:38:26.285] [ET_NET 1] DEBUG: <Http2CommonSession.cc:267 (do_start_frame_read)> (http2_cs) [1] frame header length=10, type=1, flags=0x5, streamid=11
[Nov  4 12:38:26.285] [ET_NET 1] DEBUG: <Http2ConnectionState.cc:232 (rcv_headers_frame)> (http2_con) [1] [11] Received HEADERS frame
[Nov  4 12:38:26.286] [ET_NET 1] DEBUG: <Http2ConnectionState.cc:1936 (send_rst_stream_frame)> (http2_con) [1] [11] Send RST_STREAM frame
...
[Nov  4 12:38:26.288] [ET_NET 1] DEBUG: <Http2Stream.cc:68 (~Http2Stream)> (http2_stream) [1] [1] Destroy stream, sent 0 bytes
[Nov  4 12:38:26.288] [ET_NET 1] DEBUG: <Http2ConnectionState.cc:1495 (delete_stream)> (http2_con) [1] [1] Delete stream

We also see this in the logs:

[Nov  4 12:38:26.287] [ET_NET 1] ERROR: HTTP/2 stream error code=0x07 client_ip=127.0.0.1 session_id=1 stream_id=19 recv headers creating inbound stream beyond max_concurrent limit

We also see this compression error (or sometimes a protocol error) when the problem happens:

[Nov  4 12:38:26.288] [ET_NET 1] ERROR: HTTP/2 connection error code=0x09 client_ip=127.0.0.1 session_id=1 stream_id=21 recv headers compression error

mrdg avatar Nov 04 '22 13:11 mrdg

There are two issues.

  • ATS starts processing 6th HEADERS frame before it fully process the 1st RST_STREAM frame because actual stream deletion is done asynchronously and the stream count is not updated until the actual deletion. https://github.com/apache/trafficserver/blob/b9b6ee19e1b32ce12bfb5f2f9495e290e5bf9d41/proxy/http2/Http2Stream.cc#L519-L522
  • Header blocks are not processed when the stream is refused. https://github.com/apache/trafficserver/blob/b9b6ee19e1b32ce12bfb5f2f9495e290e5bf9d41/proxy/http2/Http2ConnectionState.cc#L257-L260 https://github.com/apache/trafficserver/blob/b9b6ee19e1b32ce12bfb5f2f9495e290e5bf9d41/proxy/http2/Http2ConnectionState.cc#L1327-L1336

An endpoint receiving HEADERS, PUSH_PROMISE, or CONTINUATION frames needs to reassemble field blocks and perform decompression even if the frames are to be discarded.

https://www.rfc-editor.org/rfc/rfc9113#name-field-section-compression-a


For the first one, ATS just needs to complete stream deletion before it starts processing next frame. I don't think the stream count should be updated (decremented) until actual deletion completes because that can be used for a DoS attack and that would cause excessive memory utilization.

For the second one, ATS needs to do whatever the spec says, but we also need to ensure that malicious activity doesn't cause excessive CPU utilization. Cancelling 100 streams on a session can be valid, but 100 of immediate cancellation is probably not.

maskit avatar Nov 29 '22 18:11 maskit

This issue has been automatically marked as stale because it has not had recent activity. Marking it stale to flag it for further consideration by the community.

github-actions[bot] avatar Nov 30 '23 01:11 github-actions[bot]