dash.js icon indicating copy to clipboard operation
dash.js copied to clipboard

The end of one segment is the beginning of the next: low latency instability due to suspected timing issues

Open wilaw opened this issue 4 years ago • 4 comments

Environment
  • [x] The MPD passes the DASH-IF Conformance Tool on https://conformance.dashif.org/
  • [x] The stream has correct Access-Control-Allow-Origin headers (CORS)
  • [x] There are no network errors such as 404s in the browser console when trying to play the stream
  • [x] The issue observed is not mentioned on https://github.com/Dash-Industry-Forum/dash.js/wiki/FAQ
  • [x] The issue occurs in the latest reference client on http://reference.dashif.org/dash.js/ and not just on my page
  • Link to playable MPD file: https://cmafref.akamaized.net/cmaf/live-ull/2006351/combined/out.mpd
  • Dash.js version: 3.1
  • Browser name/version: Chrome latest
  • OS name/version: Mac
Steps to reproduce
  1. Use this player http://mediapm.edgesuite.net/will/dash/lowlatency/low-latency-public-variable.html?latency=3&url=https://cmafref.akamaized.net/cmaf/live-ull/2006351/combined/out.mpd
  2. Start playback at 3s latency
  3. Slowly reduce it to 2s, then 1.5s. Verify that chart shows stable buffer
  4. if the chart shows >1s buffer at 1.5s latency, then lower latency target to 0.5s.
  5. The result may depend upon location, but from my testing location (San Francisco), player becomes unstable. It oscilates up to 3s and then back down to 0.5s.
  6. During this testing, verify that no segment takes more than 6.05s to be delivered by the CDN by examining the browser network panel.
Observed behaviour

Screen Shot 2020-05-20 at 4 13 39 PM

Screen Shot 2020-05-20 at 4 14 47 PM

The key concern here is that the rebuffering is not in fact caused by the CDN delivering any segment slowly. It must be caused by the player making a late request for the segment. This timing error may always be there (it is absorbed and hidden) by larger buffer levels, or it is introduced at the sub-second target mark through some hard-coded logic in the player.

At this very low latency, the player can rely upon the fact that receipt of the end of segment N is a signal to immediately request segment N+1. This is important if there is any variability in the segment durations. If the player waits for when it thinks the next segment is available, it will burn its buffer, or request too early and get a 404. The player need only estimate the availability of the first segment it requests at start-up, seek or a switch. After that, it can simply fetch N+1 as soon as N is completely received.

There is no reason that dash.js player should not be able to play this stream with a 0.5s target as long as the CDN is smoothly delivering the data. I hope its low latency logic can be improved as this rebuffer pattern seems self-induced.

wilaw avatar May 20 '20 23:05 wilaw

Notes: Did some first tests. For that reason:

  • Removed audio adaptation set via proxy for easier debugging
  • Set latency to 0.5 seconds.

Settings

  • Chunks are a single frame with a segment duration of 6 seconds: Chunk = 6(Seg Duration) - 5.967(ATO) = 0.033 = 33ms Duration of one frame = 1/(30000/1001) = 33.36666667 ms

Observations Video element throws a waiting event although nearly half a second of data in the buffer is still available:

Waiting: Video Buffer end : 21149.995532, current time: 21149.278364 
Diff = 717ms -> Should not stall
 
Waiting: Video Buffer end : 21150.028899, current time: 21149.999428 =
Diff = 29.471ms --> Buffer less than one chunk, ok to stall

Next steps Check why the player stalls although some data in the buffer is still available. Maybe some setting via MSE regarding minimum buffer?

dsilhavy avatar Jun 15 '20 13:06 dsilhavy

@wilaw Some updates on this. I tested the behavior with video/audio only:

Video only

The latency is pretty stable, no oscillating pattern. Some waiting events can be observed in the console. However, those waiting events only occur when the buffer is below one frame: Buffer: Number of ranges : 1 start: 4198.027166 , end : 4218.013799, current time: 4217.989424

video only

Audio only

Here it gets interesting. When playing audio only I could reproduce the pattern you describe:

audio_only

What I don't understand at the moment is why the video element is throwing waiting events although there is close to half a second of data in the buffer:

Buffer: Number of ranges : 1 start: 1255.020433 , end : 1278.010066, current time: 1277.615413

Playback stalls until the buffer is around 3 seconds again. This goes on for the rest of the time and causes the oscillating pattern. Looking at the MPD I don't see a mistake, one audio frame has a duration of: 1024/48000 = 21.3ms This is indicated in the MPD as the availabilityTimeOffset is set to 5.979.

So next step is to keep testing with audio only in order to understand why playback stalls if enough data is available.

dsilhavy avatar Jun 18 '20 09:06 dsilhavy

Thanks for investigating and documenting this. Is the 3s oscillation tied to the initial 3s target latency when the player first started? if you start with a 4s target, do the subsequent oscillations mirror that? If so, it may be a clue that some parameter is set at start-up that is not updated when livedelay target is adjusted post-startup?

wilaw avatar Jun 18 '20 12:06 wilaw

We stumbled upon this behavior during analysis of #3538

The segments referenced by the MPD above ("link to playable MPD") are not available anymore. However, managed to reproduce the problem with the live-sim content. Starting with catch-up rate of 50% and live latency of 1.5, playback is stable. Reducing to 1.3, the oscillation starts.

image

Starting with with live latency of 1.3 right from the beginning, playback is stable for short period but starts to oscillate after a while. image

So, it does not matter if target latency is changed during playback or changed beforehand.

Affects v3.2.0, v3.2.1, development. Investigating.

mlasak avatar Mar 18 '21 17:03 mlasak