ndt-server icon indicating copy to clipboard operation
ndt-server copied to clipboard

Explore pacing-gain to understand early termination

Open gfr10598 opened this issue 4 years ago • 10 comments

We have a lot of ndt7 data now, and should explore the pacing-gain behavior, to understand how well early termination could work based on pacing-gain, and also verify whether there is an earlier fixed termination time that would be effective for 95+ percent of clients.

gfr10598 avatar Jul 29 '20 16:07 gfr10598

https://console.cloud.google.com/bigquery?sq=581276032543:5a15dedd9d1a4133b498357ddc54929f

Looks like pacing gain usually drops below 1.25 within less than 5 seconds. Need more analysis.

gfr10598 avatar Jul 29 '20 17:07 gfr10598

WRONG - bug in SQL

ok1 ok2 ok5 ok10 total
3988220 4233782 4254709 4275755 4275755
-- -- -- -- --

92% converge within 1 second 99% converge within 2 seconds 99.5% converge within 5 seconds 100% converge within 9-13 second test duration.

gfr10598 avatar Jul 29 '20 18:07 gfr10598

ok1 ok2 ok5 ok10 total
2226638 3189769 3990049 4184933 4249948

Of those that converge: 50% converge with 0.8 seconds 90% converge within 3.3 seconds 95% converge within 4.8 seconds 99% converge within 8.1 seconds

gfr10598 avatar Jul 29 '20 22:07 gfr10598

The average convergence time is about 1.4 seconds, and average BytesAcked at convergence is about 69MB. Very fast tests tend to converge faster. Slower tests tend to converge slower. Tests that converge in more than 1 second converge after, on average, about 50MB of transfer.

Spreadsheet (google.com)

gfr10598 avatar Jul 29 '20 22:07 gfr10598

Found another SQL bug. The spreadsheet has been updated. The average BytesAcked trends up with convergence time, and averages about 5.5 MBytes. (log mean around 1.7MB)

The worst 5%, with convergence time > 4.8 seconds, average 28 MBytes (log mean 4MB), and the latest converging average around 40 to 60 MBytes.

The BBRInfo.MinRTT averages around 1 msec (log mean 0.88) for the fastest converging tests, and around 200 msec for tests that take 6 seconds or more to converge.

On average, it looks like it takes around 25 to 30 minRTT to converge, but sometimes as few as 5 or 10, and sometimes 1000s of MinRTTs.

gfr10598 avatar Jul 30 '20 02:07 gfr10598

Median number of round trips to convergence is around 25 for fastest convergence, up to 50 for the slowest convergence. Query

gfr10598 avatar Jul 30 '20 02:07 gfr10598

I've refined the query to actually look for first two crossings, from <=1 to <1, and from <1 to >=1. This changes the results slightly, but not dramatically. NOTE: With this convergence metric, only about 3/4 of the ndt7 tests reach convergence. For another 10%, the PacingGain drops below 1.0, but does not cross 1.0 again.

Spreadsheet (google.com only) Query

gfr10598 avatar Jul 30 '20 15:07 gfr10598

BQ connected spreadsheet comparing throughput, BBR_BW at convergence, and BBR_BW at 10 sec.

Convergence vs speed

Screen Shot 2020-07-30 at 8 32 31 PM

gfr10598 avatar Jul 31 '20 00:07 gfr10598

Convergence Time CDF

gfr10598 avatar Aug 04 '20 17:08 gfr10598

Needs design discussion with @pboothe

laiyi-ohlsen avatar Aug 10 '20 16:08 laiyi-ohlsen