quiche Changes control flow credits increases to a fixed amount of half init…

…ial max_data

See issue #983

Previous implementation of update_max_data() and related code has an issue that may result in max_data increments becoming smaller over time and increasing the amount of MAX_STREAM_DATA frames sent.

Another potential issue would be to have the difference between max_data and max_data_next increasing over time.

The implemented solution sets a max_data_incr that is equal to half the initial max_data. max_data_incr is used to determine when the stream is almost full, i.e. len > max_data - max_data_incr and to set the new max_data, i.e. max_data = max_data + max_data_incr. The control flow credits always increase by half the initial max_data and the unspent control flow credits (max_data - len) is never larger than the initial max_data.

Jun 30 '21 15:06 Sparika

What is the reason behind it set to initial max_data / 2, which is fixed over the connection lifetime? When initial max_data is sufficiently small (especially when current bdp > max_data_incr), it would cause the connection is limited by the flow control, it would be useful to have a qlog flow control graph to compare.

p.s. #529 for dynamically changing this threshold based on the measured bdp, it might be revisited if needed

Jul 06 '21 17:07 junhochoi

max_data_incr is the value by which max_data will be incremented regularly, i.e. each time len > max_data - max_data_incr. Because the increment is triggered each time len is greater than max_data - max_data_incr and the new limit is set to max_data = max_data + max_data_incr, the available control flow credit is always less than 2x max_data_incr. More explicitly: max_data - max_data_incr < len < max_data < max_data + max_data_incr. By setting max_data_incr = [initial]max_data/2 using the initial value, the code ensures that the available control flow credit is always less than 2x ([initial]max_data/2) == [initial]max_data.

Supposing a symmetrical connection with a throughput equal to the initial max_data. Then after 1/2 RTT, the connection would have consumed max_data/2, triggering the increase max_data = max_data + max_data/2 from the receiver. The MAX_DATA frame would arrive 1/2 RTT later at the sender, just in time for refreshing the limit for the sender. I've not run test or graphs for that part but that would surely be interesting.

Regarding dynamically changing the control flow limit, my understanding is that the limit ensures the receiver does not over-commit on memory, and so I believe that whatever the BDP or other network properties, the network should not influence the control flow limit. Otherwise that would allow the network to force the receiver into committing more memory than intended. As I understand, this serves as a way to communicate the buffer limit to the sender while having a throughput otherwise unlimited (except for congestion control).

Feel free to correct if I'm wrong in anything. ;)

Jul 07 '21 00:07 Sparika

Regarding dynamically changing the control flow limit, my understanding is that the limit ensures the receiver does not over-commit on memory, and so I believe that whatever the BDP or other network properties, the network should not influence the control flow limit

That kind of depends. Autotuning flow control based on network perf is a common practice. To address the concern that you raise, what's needed is some receiver-local implementation control of the window scaling maximum. Right now quiche is quite basic - an application states the initial limit and we use that as the upper bound forever more.

Autotuning can indeed help implementations be more efficient or conservative with memory because they can afford to set lower initial limits for "light/slow" connections and allow the "heavy/faster" connections to make use of the freed up resource.

Jul 07 '21 00:07 LPardue

Let's be clear here though. The point of this PR is to fix a problem with the existing and currently used flow control update mechanism, the point is not to completely change it. That is out of topic for this PR.

Jul 07 '21 09:07 ghedo

I did the same quick and dirty test I've done in #983, and this time I've plotted the max_data and len of the stream. The first graph shows what's happening with the master branch, the second shows this PR. In the first graph, the linear increase in max_data is because the value is continuously increasing while on the second graphs, the stairs pattern shows that the value is updated quite less frequently.

Jul 07 '21 15:07 Sparika

Supposing a symmetrical connection with a throughput equal to the initial max_data. Then after 1/2 RTT, the connection would have consumed max_data/2, triggering the increase max_data = max_data + max_data/2 from the receiver. The MAX_DATA frame would arrive 1/2 RTT later at the sender, just in time for refreshing the limit for the sender. I've not run test or graphs for that part but that would surely be interesting.

Let's see the following example. I have the following dev environment:

MacOS (as a client) -> Linux VM (Debian Buster) (as a server)
server: running quiche-server at linux vm, from master branch, running as follows:

tools/apps/target/release/quiche-server --no-retry --cert tools/apps/src/bin/cert.crt --key tools/apps/src/bin/cert.key --root <dir> --listen '0.0.0.0:8443'

set linux qdisc as follows: this is to simulate 1000Mbit bandwidth/25ms rtt/0% packet loss. Note that this link's BDP is 3125000 (1000Mbit * 25ms) tc qdisc add dev enp2s1 root netem delay 25ms 0ms rate 1000Mbit limit 2116 reorder 0 loss 0
client: run two different version, your PR (404f8d1) and master right before your PR (5afe2882)
Note that client's max_data and max_stream_data as default will be 10000000 and 1000000. (see tools/apps/arc/args.rs)

Now, run the quiche-client with the following option, downloading 100Mbyte file, as follows:

tools/apps/target/release/quiche-client --no-verify --dump-responses ./ --wire-version ff00001d 'https://linux:8443/100m.dat'

with default option (--max-stream-data 1000000)
with --max-stream-data 4000000 (4000000 is just a higher number than the BDP above).

Here is what I get (total download time as seconds):

this PR + default: 6.031s
this PR + max-stream-data 4000000: 2.276s
master branch + default: 3.666s
master branch + max-stream-data 4000000: 2.304s

As I expected, current PR seems to start to throttle where current BDP > initial max_stream_data (max_data is not considered here because current default is pretty high - 10M - no impact on the current test)

On the other hand, we can see the problem of current master branch here but its behavior can be a separate discussion.

Jul 07 '21 18:07 junhochoi

@junhochoi in #529 you reference a chromium documents for control flow algorithm. This document states that "the flow control window update triggered when: available window < max window size / 2 where available window is max_data - len.

This PR implements actually the same logic (no auto-tuning) as described in the Chromium document:

avail_window < max_window_size / 2
= max_data - len < max_window_size / 2
[renaming]
= max_data - len < [initial]max_data / 2
= - len < [initial]max_data / 2 - max_data
= len > max_data - [initial]max_data / 2
i.e.
= len > max_data - max_data_incr

The test you run shows that this logic has quite more throttle to the throughput than the master branch. I believe it would be because the master branch then frantically sends MAX_STREAM_DATA frames for every received packet to the sender which allows the sender to be throttled by not much.

But on the other hand, the test does not show the amount of data avoided in the receiver -> sender direction. In my example, I've selected the last 4400 traces (not the full connection sequence, supposedly one trace per packet). In the PR, the number of max_stream_data increment is only 12 compared to 4400 (all of them) for the master branch. So that would mean the number of MAX_STREAM_DATA frames is divided by 300.

So there may be an impact on the throughput if the control flow limit is mis-configured. But it seems to me the benefit is greater.

Jul 08 '21 15:07 Sparika

@Sparika I understand sending less frames can save the number of packets going back to the sender. (note that # of saved frames > # of saved packets because we need to send ACK and other frames to the sender) but from my perspective it should be balanced with the performance as well.

Jul 08 '21 17:07 junhochoi

My latest commit implemented changes requested by @ghedo. Is there anything I should change to get the PR accepted?

Aug 16 '21 15:08 Sparika

quiche quiche copied to clipboard

Changes control flow credits increases to a fixed amount of half init…

quiche
quiche copied to clipboard