webrtc-stats
webrtc-stats copied to clipboard
Clarify media-source frame rates (frames and framesPerSecond)
RTCVideoSourceStats contains frame counters as "frames originating from this source".
But the spec is not clear about which point in the pipeline this is measured. If you measure this as early to camera capture as possible you may see more frames than if you measure this at the entry to WebRTC in rare cases such as running out of frame buffers or other errors causing us to drop frames.
We should clarify the spec to say that...
- These are the frames that have reached WebRTC. E.g. the delta between this input fps and the encoder fps would be frames that have been dropped by the webrtc encoder, not frames that were dropped between capture process and renderer process etc.
This matches what libwertc is already doing.
- Clarify that these are the frames prior to adaptation. Adaptation is an encoder implementation detail, so frames dropped due to adaptation should be part of the delta between input fps and encoder fps.
libwebrtc has a known bug where we count the frames after adaptation, but this is wrong since it exposes an implementation detail that does not make sense on paper - adaptation is clearly not part of the track's "source"...
The idea based on different spec discussions (e.g. this) is that we should have a counter for camera fps on the track and then each frame consuming use case should have its own frame counter, for example media-source.frames being a measurement that happens later than track.framesCaptured()
Editorial label because I'm not proposing to change what was, as far as I understand, always the intent of the spec. E.g. we always counted this in webrtc, and I always considered post adaptation a bug.
This makes sense, agree it is editorial. And this is kinda expected, as we start to have more frames* metrics along the send and receive pipeline, we will notice that the definitions become more nuanced.
Here's another edge case: Chrome has a feature where, to save on performance and bandwidth, screenshare tracks stop capturing new frames when the content is static. But to ensure good QP values and frames not getting lost, the encoder will repeat frames. The repeated frames go through the encoder the same way new frames does, but they were technically not re-captured.
My opinion is that any frame that gets input to the encoder is considered media-source frames, even if it includes repeated frames.
Now that we have MediaStreamTrack Statistics, i.e. track.stats
, we do have stats from capture to delivery to sinks.
So I think we should re-think this part:
Clarify that these are the frames prior to adaptation. Adaptation is an encoder implementation detail, so frames dropped due to adaptation should be part of the delta between input fps and encoder fps.
If the frames are prior to adaptation, they're almost the same think as track.stats.deliveredFrames
. Alternatively, if we update the spec to match what the implementation is already doing, i.e. they are frame counters after adaptation, then we have a good measure on what WebRTC is trying to encode with. This is still comparable to what the encode fps actually achieves, since the encoder can drop frames internally, so both measures are still useful.
Either option is valid, but since we already have track.stats
, it might actually make sense to say that media-source
is AFTER adaptation. This gives the app more information and is compatible what the user agents are already doing.
Proposal:
Make media-source
reflect input to the sender, NOT the track's stats (track.stats
). This means that:
- If adaptation is happening, the media source fps can be lower than the
track.stats
fps. - The
media-source
is not "per track", since it reflects theoutbound-rtp
's adaptation. E.g. two senders encoding the same track could still result in twomedia-source
objects with the sametrackIdentifier
. - This is what Chromium has already implemented.
So media-source and outbound-rtp would map 1:1 if not for the fact that outbound-rtp lifetime is said to start when the first packet is sent, and media-source could be created as soon as pc.addTrack
is called (before first packet is sent)