hls.js Fix bandwidth sampling for small transfers and fast connections

This PR will...

Fix bandwidth sampling weight for small transfers and/or fast connections.

Why is this Pull Request needed?

The current logic is broken for fast transfers (due to small transfer sizes, or from a fast connection). See #3563 for details.

Are there any points in the code the reviewer needs to double check?

This changes the exact meaning of the abrEwma<X> options, which could be considered a breaking change.

I have also had to significantly lower the abrEwmaFastLive default to accommodate LL-HLS level downswitches. The old value meant that it takes too long to discover a new lowered bitrate, and could cause multiple successive emergency level switches and stalls. Note that the new value could make a temporary connection issue more likely to cause temporary downswitches. It might make sense to only use this low value for LL-HLS content, but that is outside the scope of this patch.

Resolves issues:

Partly fixes #3563. With this PR, the ABR algorithm is more likely to switch up from a low level (still only when there is sufficient bandwidth). Low latency content on high latency links are still unable to measure a suitable bitrate.

Checklist

[x] changes have been done against master branch, and PR does not conflict
[x] new unit / functional tests have been added (whenever applicable)
[x] API or design changes are documented in API.md

Mar 09 '21 13:03 kanongil

I can't accept it based on how it would impact all streams.

That is the point of this PR. To fix the most egregious issue of #3563.

Did you see the detailed note in the commit, that gives an example of just how broken the current estimator is?

With 5s segments of size 100,000: Start 10x 5 sec transfers => 160Kbps (reference)

Rate change (old): 1x 0.5 sec transfer (1.6Mbps) => 216Kbps vs 1x 0.05 sec transfer (16Mbps) => 222Kbps (10x faster is only 4% more!!)

Rate change (new): 1x 0.5 sec transfer (1.6Mbps) => 627Kbps vs 1x 0.05 sec transfer (16Mbps) => 5.3Mbps

Ie. a sampling with a 10x bandwidth increase can mean just a 1.35x increase over a 5 second interval, while a 100x bandwidth increase only makes it 1.39x !!!

An estimation rework is essential to ever get LL-HLS to work with ABR switching. This PR fixes the fundamental issue, and high-bandwidth estimation issues with the current implementation.

I really hope you will prioritise a fix before 1.0.0.

Note that the new default abrEwmaFastLive value is not essential to the fix, and could be omitted from the PR. Hls.js will need a mechanism to lower it for smooth near-edge LL-HLS ABR playback, though. Maybe the abrEwmaFastLive value could be capped to 1/2 the current time (in seconds) to buffer exhaustion?

Mar 18 '21 10:03 kanongil

I had another look at the estimation, and found that it can be simplified, and work better, if the weight is always a fixed value for each sample.

So my initial patch tried to use the fragment / part duration for the weight, which makes some sense, and certainly works a lot better than the current logic. However, it meant that the halfLife needed to be quite different for LL-HLS vs normal content.

I came to realise that there are essentially 2 modes when estimating, part loading vs. fragment loading, and both needs to adjust for bandwidth changes in sample time. Ie. when close to live edge, both modes have ~2 parts/fragments time to react to a bandwidth change.

Based on this realisation, I changed the abrEwmaFast/Slow values to just represent samples. This means that the same value will work quite nicely for both part and fragment loading, and the implementation can be simplified. I converted from the current slow=3 & fast=9 values using a normalised 6 second fragment duration. Besides improving the estimation responsiveness of part loading, I expect it will also work better for fragment loading for playlists with high/low fragment durations without tweaking abrEwmaFast/Slow.

As part of this revision I also removed the Live/VOD distinction of the config values. I did this since the default values were already the same, and because adjusting the values based on the playlist type is a very simplistic approach. It makes much more sense to adjust the values based on the current buffer level. Both live and VOD playback can have low & high buffer levels due to bandwidth conditions. The current values are tuned to a low buffer level, so it would be prudent to detect a high buffer level and raise the values dynamically, to avoid quality dips from a temporary bandwidth burp. This is probably outside of the scope of the current patch, though.

Mar 19 '21 11:03 kanongil

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Apr 16 '22 15:04 stale[bot]

This issue has been automatically closed because it has not had recent activity. If this issue is still valid, please ping a maintainer and ask them to label it accordingly.

Apr 19 '22 16:04 stale[bot]

This PR has been replaced by #4825

Nov 04 '22 23:11 robwalch