webrtc-stats icon indicating copy to clipboard operation
webrtc-stats copied to clipboard

WPT tests are wrong about when "outbound-rtp" and "inbound-rtp" stats appear

Open jan-ivar opened this issue 3 years ago • 24 comments

The spec says: "The lifetime of all RTP monitored objects starts when the RTP stream is first used: When the first RTP packet is sent or received on the SSRC it represents, or when the first RTCP packet is sent or received that refers to the SSRC of the RTP stream."

TL;DR: no packets no stats.

Yet these WPT tests assume stats appear immediately after negotiation (ahead of "connected" even):

jan-ivar avatar Jan 26 '22 00:01 jan-ivar

I've filed vendor bugs crbug 1291019 and webkit 235619.

Firefox has it right for outbound-rtp but not inbound-rtp, which is bugzilla 1751532.

jan-ivar avatar Jan 26 '22 00:01 jan-ivar

I do not think the spec definition is very useful. It would have prevented detection of the infamous "macos audio bug" which showed up as "no bytes sent". Easy fix: ... is first negotiated: somethingsomething sdp (but it might not be as easy considering the wild dream of not putting ssrcs into the sdp)

I do agree that adding stats before negotiation is probably not useful though as in the (rare) case of a m-line getting rejected you would still retain a stats entry for it. At implementation level this can be determined by whether a payload type was negotiated or not.

(and yes, this is testable e.g. by your fiddles :-))

fippo avatar Jan 27 '22 13:01 fippo

I'm glad it helped track down a local mic bug, but that's hardly the job of peer connection stats.

jan-ivar avatar Jan 28 '22 17:01 jan-ivar

You are proposing to intentionally curtail an implementation behavior that has demonstrably proven to be more useful than harmful for the sake of not updating a text that is not cast in stone?

fippo avatar Jan 28 '22 20:01 fippo

I'm not proposing anything. I've merely filed this issue to fix the WPT tests to match the spec. We ran into this discrepancy while implementing more stats in Firefox.

jan-ivar avatar Jan 29 '22 00:01 jan-ivar

which is a great catch and I agree the current behavior of some implementations is potentially harmful in some rare cases. But the spec behavior is more harmful so the action here is to fix the spec (and then fix some implementations a bit and then write moar tests).

Lets assume I tell you that I am going to send you video on ssrc 1234. You can then either

  • loop over all stats and then determine you have not seen ssrc 1234 yet
  • find a stats entry with that ssrc and then you know you have received 0 bytes

I assume the second variant is more common and will argue it is more useful because you can use the same logic to get the number of bytes received over the lifetime of the connection.

fippo avatar Jan 29 '22 08:01 fippo

Both ways seem fine to me and equally deterministic in the case you're talking about which sounds like an established connection, but right now we have vendors exposing stats ahead of negotiation which I hope we can agree is unnecessary.

In an established connection, we're just talking about whether absence equals zero bytes, which seems easy enough to infer for apps that don't care about the difference:

  const stats = await pc.getStats();
  const inbound = findMine([...stats.values()].filter(({type}) => type == "inbound-rtp"));
  console.log(`Received ${inbound?.bytesReceived || 0} bytes.`);

But this seems wrong to bake into the API, especially when bytesReceived is only one of 67 "inbound-rtp" members, none of which have prose today for what value they should hold ahead of having seen a relevant rtp (or rtcp) packet:

timestamp type id ssrc kind transportId codecId packetsReceived packetsLost jitter packetsDiscarded packetsRepaired burstPacketsLost burstPacketsDiscarded burstLossCount burstDiscardCount burstLossRate burstDiscardRate gapLossRate gapDiscardRate framesDropped partialFramesLost fullFramesLost receiverId remoteId framesDecoded keyFramesDecoded frameWidth frameHeight frameBitDepth framesPerSecond qpSum totalDecodeTime totalInterFrameDelay totalSquaredInterFrameDelay voiceActivityFlag lastPacketReceivedTimestamp averageRtcpInterval headerBytesReceived fecPacketsReceived fecPacketsDiscarded bytesReceived packetsFailedDecryption packetsDuplicated perDscpPacketsReceived nackCount firCount pliCount sliCount totalProcessingDelay estimatedPlayoutTimestamp jitterBufferDelay jitterBufferEmittedCount totalSamplesReceived totalSamplesDecoded samplesDecodedWithSilk samplesDecodedWithCelt concealedSamples silentConcealedSamples concealmentEvents insertedSamplesForDeceleration removedSamplesForAcceleration audioLevel totalAudioEnergy totalSamplesDuration framesReceived decoderImplementation

Likewise, bytesSent is only one of 49 "outbound-rtp" members, none of which have prose today for what value they should hold ahead of having seen a relevant rtp (or rtcp) packet:

timestamp type id ssrc kind transportId codecId packetsSent bytesSent rtxSsrc mediaSourceId senderId remoteId rid lastPacketSentTimestamp headerBytesSent packetsDiscardedOnSend bytesDiscardedOnSend fecPacketsSent retransmittedPacketsSent retransmittedBytesSent targetBitrate totalEncodedBytesTarget frameWidth frameHeight frameBitDepth framesPerSecond framesSent hugeFramesSent framesEncoded keyFramesEncoded framesDiscardedOnSend qpSum totalSamplesSent samplesEncodedWithSilk samplesEncodedWithCelt voiceActivityFlag totalEncodeTime totalPacketSendDelay averageRtcpInterval qualityLimitationReason qualityLimitationDurations qualityLimitationResolutionChanges perDscpPacketsSent nackCount firCount pliCount sliCount encoderImplementation

So you'd be asking for a significant spec change here, if I even understand correctly what change you're proposing.

Also, SSRCs can have collisions, so using them to correlate data isn't sound to begin with. This is why mid and rid were added to the spec, as the preferred ways to correlate what someone will or won't send (sadly not implemented anywhere yet 😕).

jan-ivar avatar Feb 02 '22 17:02 jan-ivar

vendors exposing stats ahead of negotiation which I hope we can agree is unnecessary.

Correct, that should go away. I think such a change won't cause too much breakage.

type
id
ssrc
kind

are determined at creation (i.e. after SLD) - which is why chrome sets them right now.

transportId

should be set at least after (bundle) negotiation I think.

codecId

will also be set after negotiation.

0 is an appropriate value for the numeric values, see what Chrome currently does. ...

If you think libwebrtc should be changed I think this is going to break quite some code. Feel free to drive this change, this will be an interesting experience for you.

Also, SSRCs can have collisions

SSRC collisions are a myth. Chrome only logs but takes no action on libsrtp detecting one, Firefox crashes?

fippo avatar Feb 02 '22 19:02 fippo

Correct, that should go away. I think such a change won't cause too much breakage.

So then, in your mind, when would stats appear? After negotiation? After connection (and negotiation)?

SSRC collisions are a myth.

If you restart ICE, are you guaranteed the same SSRCs?

jan-ivar avatar Feb 02 '22 22:02 jan-ivar

So then, in your mind, when would stats appear? After negotiation? After connection (and negotiation)?

After negotiation so a test that drops all ice candidates can check they exist. A test that rejects a m-line would show these stats don't exist.

If you restart ICE, are you guaranteed the same SSRCs?

ICE and RTP are different layers of the stack so I do not expect SSRCs to change.

fippo avatar Feb 03 '22 07:02 fippo

when would stats appear?

Sorry I should have limited this question to: when would rtp stats appear? (ice stats have different lifetimes)

@alvestrand you wrote the spec language here in https://github.com/w3c/webrtc-stats/pull/349. What's your view of Chrome's discrepancy from the spec here?

jan-ivar avatar Feb 03 '22 16:02 jan-ivar

#302 looks like the intent was "after negotiation". But then JSEP tells us there shall be no ssrc lines.

I think we need the following test assertions:

  1. createOffer->SLD does not create objects
  2. createOffer->SLD-SRD-with-ssrcs does create objects even when no connection is established
  3. createOffer->SLD->SRD-without-ssrc does not create objects since the ssrc is not known
  4. createOffer->SLD->SRD-without-ssrc + connection does create objects since the ssrc is known

The assumption seems to be that the ssrc is constant and always set (looking at the object ids in chrome :scream_cat:) which leaves ssrc collisions woefully under-specified but I doubt that is a big problem.

fippo avatar Feb 04 '22 08:02 fippo

The reason for delaying creation was that the mandatory and identifying attribute for the stream is the SSRC, and we can't be certain what the SSRC will be until we see the first packet (unless we have nonstandard ssrc signalling).

alvestrand avatar Feb 07 '22 16:02 alvestrand

but if we have "standard signaling" shouldn't the stats have a mid attribute (the transceiver has)?

fippo avatar Feb 08 '22 07:02 fippo

The stats spec has mid, but that doesn't seem like a good reason to expose the wrong SSRC.

jan-ivar avatar Feb 08 '22 18:02 jan-ivar

I might be digressing to the :unicorn: of a ssrc collision a bit ("if there is a collision what should be the impact") but this has impact on the question at hand... if we had mid would this still mean we hide in the

  • createOffer->SLD->SRD-without-ssrc does not create objects since the ssrc is not known

case even though there is something identifying (which may have the ssrc not set)?

fippo avatar Feb 08 '22 18:02 fippo

Having stats appear based on whether SSRCs are known seems worse than both the spec and Chrome today.

FYI, we removed SSRCs from getParameters in https://github.com/w3c/webrtc-pc/pull/1531. The reasons given there might resonate here.

jan-ivar avatar Feb 08 '22 20:02 jan-ivar

Fyi the official way to read (non-zero) RTP stats is to wait for RTP, like this:

  pc1.addTrack((await gUM({video:true})).getTracks()[0]);
  const {track} = await new Promise(r => pc2.ontrack = r);
  await new Promise(r => track.onunmute = r);
  const stats = await pc1.getStats();
  console.log(`SSRC = ${[...stats.values()].find(({type}) => type == "outbound-rtp").ssrc}`);

But note that due to crbug 1295295 this would still return stats early in Chrome. 🤷🏼‍♂️

jan-ivar avatar Feb 08 '22 20:02 jan-ivar

Having stats appear based on whether SSRCs are known seems worse than both the spec and Chrome today.

But this is exactly what Chrome does. Which is slightly harmful in the case of (1) above (edited and added numbers) so I have a one-line fix that I don't expect to break much and it seems to match the intent of #302. This would match Firefox for outbound and would have allowed you to not "fix" inbound?

if you think chrome/libwebrtc should change you should do this (oneliner) but I think this will break a lot of apps

fippo avatar Feb 09 '22 08:02 fippo

in a discussion with @henbos about cache invalidation I noticed that "has received a packet" (or the first) is not something we would like to cause a cache invalidation because that is going to be pretty heavy. Having this defined on events which already do invalidate still seems better to me.

fippo avatar Jun 17 '22 09:06 fippo

To clarify, I said that we shouldn't invalidate the cache every time a packet counter increments since that would make caching rarely applicable. But if receiving the first packet causes a new stats object to appear (something that only happens a few times, not continuously) then we should definitely invalidate the cache.

Regarding this spec issue, the core of the issue appears to be the assumption that we need the SSRC in order to create the RTP stats objects. And when this spec issue was filed I think that was true.

As of recently though we've added the mid and rid properties though, so it would be possible to create these objects at signalling time even if the SSRC isn't known (and you'd still be able to tell which stream is which), which may be more consistent with what Chromium implementations currently do.

There would still be the issue about "well what if the ssrc changes?". The spec seems to think that RTP objects are permanent, so a new SSRC should imply a new RTP object and the old one remains. But at least in the Chromium implementation, we don't do this... the RTP stats objects exposed are just what the streams are currently, and if the SSRC goes away then so does the stats object.

henbos avatar Jun 17 '22 09:06 henbos

We got rid of "sender", "receiver", "transceiver" and "track" stats in favor of just moving everything to "outbound-rtp" and "inbound-rtp" which we've already done in both spec and implementation so I think delaying the creation of the RTP objects is harmful.

On top of that, I don't like the idea of permanent objects since a) it's a memory leak, b) it adds implementation complexity, and c) it rarely seems useful; in practise people poll getStats periodically during a call, not a single time after the call has already ended.

So to me it seems easier to simplify the spec than to insist that the implementation delays creation of stats objects that you would legitimately want to be there.

henbos avatar Jun 17 '22 09:06 henbos

We got rid of "sender", "receiver", "transceiver" and "track" stats

8 days ago? I wasn't aware. Please see https://github.com/w3c/webrtc-stats/pull/628#issuecomment-1159230378.

I think delaying the creation of the RTP objects is harmful.

This issue isn't proposing a spec change, and there's evidence in #643 this harm may have been self inflicted by us moving lots of metrics to RTP objects with half the room not realizing their lifetimes were different.

As of recently though we've added the mid and rid properties though, so it would be possible to create these objects at signalling time even if the SSRC isn't known (and you'd still be able to tell which stream is which), which may be more consistent with what Chromium implementations currently do.

There may be some good ideas here, but I would ask that you open a new issue for this detailing what you propose exactly, so members can try to poke holes in it. The current spec is quite clear about RTP lifetimes, so this would be a major change.

This issue is about WPT tests.

jan-ivar avatar Jun 17 '22 21:06 jan-ivar

I filed https://crbug.com/1368987 regarding beefier WPTs.

henbos avatar Sep 28 '22 08:09 henbos