hls.js icon indicating copy to clipboard operation
hls.js copied to clipboard

Improve bandwidth estimation and adaptive switching

Open robwalch opened this issue 3 years ago • 2 comments

This PR will...

Improve bandwidth estimation and adaptive switching with smaller segments and higher TTFB

Why is this Pull Request needed?

Estimating time-to-first-byte exclusive from the time used to estimate bandwidth, allows us to more accurately predict the time it takes to load segments, especially those with shorter durations closer to the average round trip time of a request.

There were also several issues with loading stats, combined vs main video buffer observation, and calculation of inflight BW performed before bytes were loaded that all contributed to bad emergency down-switch and BWE corruption in _abandonRulesCheck. Thanks to @Pri12890 for pointing many of these out.

Are there any points in the code the reviewer needs to double check?

There are two new public methods on the player instance that I have left undocumented:

  1. get mainForwardBufferInfo(): BufferInfo | null Allows the ABR controller to access the stream controller's media buffer. Since it only deals with main variant fragment traffic, this allows it to compare that activity to the buffer it appends to rather than the combined buffer which could be stalled if alt-audio does not keep ahead.
  2. get ttfbEstimate(): number Similar to hls.bandwidthEstimate, hls.ttfbEstimate provides the latest time-to-first-byte estimate.

The TTFB sampling only uses one EWMA instance with the same slow half life as that used for bandwidth. The default estimate is 100(ms) and is not configurable. It is weighted on a curve that favors shorter values so that the occasional request RTT hiccup does not have as much impact on the estimate.

These changes will remain up for at least 1-2 minor releases and will not be released in v1.3. This is to give contributors time to review and test these changes and provide feedback.

Resolves issues:

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283) Fixes #3563 and Closes #3595 (special thanks to @kanongil for early testing and feedback on the Low-Latency HLS implementation) Related to #4291 (_abandonRulesCheck govern whether fragment loading is completed or aborted based on timeouts - not active throughput)

Checklist

  • [x] changes have been done against master branch, and PR does not conflict
  • [ ] new unit / functional tests have been added (whenever applicable)
  • [ ] API or design changes are documented in API.md

robwalch avatar Aug 04 '22 02:08 robwalch

It is great that you are trying to tackle the bandwidth estimation issues!

While it makes sense to use request latency as part of the effective bandwidth calculation, this averaging will fail with #3988. With preload hint fetching, the TTFB is not correlated with the latency, but with the request start time relative to the latest index update and part duration, which can both change from part to part. As such, I would prefer an alternative approach that doesn't require this averaging.

FYI, it's possible to get more accurate transfer timing for a single request using the Resource Timing API, though it is a bit cumbersome to extract and requires server opt-in using the Timing-Allow-Origin header.

kanongil avatar Aug 04 '22 14:08 kanongil

Thanks @kanongil,

I added to a note to #3988: "do not sample TTFB for blocked part-hint responses". The _abandonRulesCheck will need to change as well for part hints.

robwalch avatar Aug 04 '22 21:08 robwalch