webrtc-stats icon indicating copy to clipboard operation
webrtc-stats copied to clipboard

Clarify media-source frame rates (frames and framesPerSecond)

Open henbos opened this issue 2 years ago • 6 comments

RTCVideoSourceStats contains frame counters as "frames originating from this source".

But the spec is not clear about which point in the pipeline this is measured. If you measure this as early to camera capture as possible you may see more frames than if you measure this at the entry to WebRTC in rare cases such as running out of frame buffers or other errors causing us to drop frames.

We should clarify the spec to say that...

  1. These are the frames that have reached WebRTC. E.g. the delta between this input fps and the encoder fps would be frames that have been dropped by the webrtc encoder, not frames that were dropped between capture process and renderer process etc.

This matches what libwertc is already doing.

  1. Clarify that these are the frames prior to adaptation. Adaptation is an encoder implementation detail, so frames dropped due to adaptation should be part of the delta between input fps and encoder fps.

libwebrtc has a known bug where we count the frames after adaptation, but this is wrong since it exposes an implementation detail that does not make sense on paper - adaptation is clearly not part of the track's "source"...

henbos avatar Nov 15 '22 10:11 henbos

The idea based on different spec discussions (e.g. this) is that we should have a counter for camera fps on the track and then each frame consuming use case should have its own frame counter, for example media-source.frames being a measurement that happens later than track.framesCaptured()

henbos avatar Nov 15 '22 10:11 henbos

Editorial label because I'm not proposing to change what was, as far as I understand, always the intent of the spec. E.g. we always counted this in webrtc, and I always considered post adaptation a bug.

henbos avatar Nov 15 '22 10:11 henbos

This makes sense, agree it is editorial. And this is kinda expected, as we start to have more frames* metrics along the send and receive pipeline, we will notice that the definitions become more nuanced.

vr000m avatar Nov 16 '22 12:11 vr000m

Here's another edge case: Chrome has a feature where, to save on performance and bandwidth, screenshare tracks stop capturing new frames when the content is static. But to ensure good QP values and frames not getting lost, the encoder will repeat frames. The repeated frames go through the encoder the same way new frames does, but they were technically not re-captured.

My opinion is that any frame that gets input to the encoder is considered media-source frames, even if it includes repeated frames.

henbos avatar Dec 13 '22 16:12 henbos

Now that we have MediaStreamTrack Statistics, i.e. track.stats, we do have stats from capture to delivery to sinks.

So I think we should re-think this part:

Clarify that these are the frames prior to adaptation. Adaptation is an encoder implementation detail, so frames dropped due to adaptation should be part of the delta between input fps and encoder fps.

If the frames are prior to adaptation, they're almost the same think as track.stats.deliveredFrames. Alternatively, if we update the spec to match what the implementation is already doing, i.e. they are frame counters after adaptation, then we have a good measure on what WebRTC is trying to encode with. This is still comparable to what the encode fps actually achieves, since the encoder can drop frames internally, so both measures are still useful.

Either option is valid, but since we already have track.stats, it might actually make sense to say that media-source is AFTER adaptation. This gives the app more information and is compatible what the user agents are already doing.

henbos avatar Oct 23 '23 07:10 henbos

Proposal:

Make media-source reflect input to the sender, NOT the track's stats (track.stats). This means that:

  1. If adaptation is happening, the media source fps can be lower than the track.stats fps.
  2. The media-source is not "per track", since it reflects the outbound-rtp's adaptation. E.g. two senders encoding the same track could still result in two media-source objects with the same trackIdentifier.
  3. This is what Chromium has already implemented.

So media-source and outbound-rtp would map 1:1 if not for the fact that outbound-rtp lifetime is said to start when the first packet is sent, and media-source could be created as soon as pc.addTrack is called (before first packet is sent)

henbos avatar Oct 23 '23 08:10 henbos