webrtc-extensions icon indicating copy to clipboard operation
webrtc-extensions copied to clipboard

Enabling opus stereo audio without SDP munging (stereo=1)

Open perahgren opened this issue 4 years ago • 42 comments

I wonder whether it is possible to achieve an SDP setup that supports stereo audio sessions (via opus) properly without using SDP munging?

Examples of scenarios to achieve include Symmetric setup stereo setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving stereo from the receiver.

Asymmetric stereo/mono setup: -The receiver has not flagged that it wants to receive mono -The sender wants to send stereo -The sender achieves a connection sending stereo to the receiver. -The sender achieves a connection receiving mono from the receiver.

perahgren avatar Jan 14 '21 14:01 perahgren

Not really sure we need to specify anything new.

The opus stereo flag is using for signaling that the endpoint supports receiving stereo:

   stereo:  specifies whether the decoder prefers receiving stereo or
      mono signals.  Possible values are 1 and 0, where 1 specifies that
      stereo signals are preferred, and 0 specifies that only mono
      signals are preferred.  Independent of the stereo parameter, every
      receiver MUST be able to receive and decode stereo signals, but
      sending stereo signals to a receiver that signaled a preference
      for mono signals may result in higher than necessary network
      utilization and encoding complexity.  If no value is specified,
      the default is 0 (mono).

So the sdp mangling we currently need is a hack and all endpoints should signal stereo=1 in the SDP.

The question is if we need an API to make the sender encode an stereo signal. This should be done automatically by the endpoint by checking the number of channels on the audio track to decide whether stereo or mono should be used.

In case the app wanted to use a mono source and upscale it to stereo or an stereo source and downscale to mono, I think that the best way would to do it via WebAudio before passing it to webrtc.

murillo128 avatar Jan 14 '21 15:01 murillo128

I wonder if this should be solved with...

  • opus listed as two codecs, one with stereo and one with mono, negotiated as separate payload types.
  • Or same list of codecs, but a modifier to the codec preferences to opt-in for more channels

henbos avatar Jan 14 '21 15:01 henbos

Proposal:

  • RTCRtpTransceiver.setCodecPreferences(codecs, options)
dictionary RTCCodecOptions {
  unsigned short channels;
};

henbos avatar Jan 14 '21 15:01 henbos

channels is a term with meaning when it comes to codecs. But I do not think we need an API, just ship stereo=1

fippo avatar Jan 14 '21 17:01 fippo

Also, it can't be at transceiver level, as you may want to send mono and receive stereo.

I agree with @fippo that shiping stereo=1 is enough and and the decision on whether to send stereo or mono should be taken based on the track channels.

murillo128 avatar Jan 14 '21 18:01 murillo128

Sounds reasonable. In that case it can change on the fly depending on the track. Would it be OK to negotiate stereo if only mono is sent? Wherre is stereo=1 defined?

henbos avatar Jan 14 '21 18:01 henbos

stereo=1 on the offer just means that the endpoint is able to receive stereo.

murillo128 avatar Jan 14 '21 18:01 murillo128

https://tools.ietf.org/html/rfc7587#section-7.1 now ofc this isn't just about being able to play stereo. Otherwise @perahgren + folks wouldn't have taken such a long time.

opus says specifically

   o  The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding
      name.  The RTP clock rate in "a=rtpmap" MUST be 48000, and the
      number of channels MUST be 2.

mostly because nobody wants to bother with negotiating the number of channels in SDP. Any opus decoder must be able to decode stereo, see https://tools.ietf.org/html/rfc7587#section-3.4 But I think the intention was that stereo=1 is the default.

:shipit:

fippo avatar Jan 14 '21 21:01 fippo

Also, it can't be at transceiver level, as you may want to send mono and receive stereo.

In general, the handling of declarative attributes (tagging @alvestrand) seems odd.

fippo avatar Jan 14 '21 22:01 fippo

I'm just jumping into this conversation, so please forgive me if this isn't the right thread... but please do consider other channel counts and channel layouts. Currently, sending something like 6 channels of audio doesn't seem to be possible with WebRTC due to browser implementations (which may be fixed in Chromium as of yesterday?). Additionally, for my use case I need to specify an arbitrary channel layout/map, so that the encoder doesn't assume I'm sending 5.1 surround. What I need is 6 discrete channels.

As changes to the implementations and specifications are made, please consider use cases such as this. Thank you!

bradisbell avatar Jan 14 '21 22:01 bradisbell

standard opus rtp doesnt support multichannel (channels >2).

Chrome implements it with a non‐standard multiopus codec which, when enabled via sdp mangling, is automatically chosen based on the number of channels of the input track (6 for 5.1 or 8 for 7.1).

murillo128 avatar Jan 14 '21 22:01 murillo128

note: part of this problem applies to cbr as well as things like dtx One approach here would be to make the sdpFmtpLine writable. OFC that doesnt solve the problem of sdp munging at all :trollface: but at least it is a much more focused kind of munging.

fippo avatar Jan 15 '21 06:01 fippo

note that cbr useinbandfec and usedtx usage on SDP only expresses that the decoder on the receiver prefers cbr, fec or dtx, but in no way controls what the encoder on any of both sides must use:

   cbr:  specifies if the decoder prefers the use of a constant bitrate
      versus a variable bitrate.  Possible values are 1 and 0, where 1
      specifies constant bitrate, and 0 specifies variable bitrate.  If
      no value is specified, the default is 0 (vbr).  When cbr is 1, the
      maximum average bitrate can still change, e.g., to adapt to
      changing network conditions.

   useinbandfec:  specifies that the decoder has the capability to take
      advantage of the Opus in-band FEC.  Possible values are 1 and 0.
      Providing 0 when FEC cannot be used on the receiving side is
      RECOMMENDED.  If no value is specified, useinbandfec is assumed to
      be 0.  This parameter is only a preference, and the receiver MUST
      be able to process packets that include FEC information, even if
      it means the FEC part is discarded.

   usedtx:  specifies if the decoder prefers the use of DTX.  Possible
      values are 1 and 0.  If no value is specified, the default is 0.

So the SDP mungling that is allowed by current implementation is just a hack due to the lack of APIs and in no way is backed up by any standard.

murillo128 avatar Jan 15 '21 06:01 murillo128

This is my understanding:

  • All opus endpoints must be prepared to receive stereo. Even if we announced "stereo=0" we are gently asking to get mono, but the other endpoint is free to send us stereo anyway. (Similar to how order of codecs is just a preference.)
  • "stereo=1" only talks about receiving preferences, not sending preferences.
  • However if "stereo" is missing we default to "stereo=0" so in practise everybody asks for mono at the moment.

To me it sounds like no spec change is needed in order to achieve receiving stereo. @perahgren can you verify that WebRTC is able to receive stereo even if "stereo=1" is missing?

However it doesn't make sense to be sending stereo at the present moment since everybody is implicitly asking for mono by having "stereo=1" missing. To me it would make sense if "stereo=1" is the default in offers and decide the what to send based on the MediaStreamTrack's number of channels.

Questions:

  • Is the answerer allowed to modify the stereo line or is this an illegal modification by the answerer side to the SDP?
  • If we change the default, would we start sending stereo in a lot of cases where we don't today? How easy and performant is it to change number of channels of a MediaStreamTrack? In other words is there still a reason to be able to control this?

henbos avatar Jan 15 '21 16:01 henbos

FWIW I don't think it makes sense for the receiver to be the one who controls what the sender is sending stereo, other than in a restrictive sense. But assuming the receiver is OK with receiving stereo, it seems like it should be the sender's choice whether or not to do so. Otherwise we end up in a scenario where if the sender doesn't want to send stereo it has to ask the receiver to ask the sender to do mono... I don't want to encourage additional offer-answer dances.

henbos avatar Jan 15 '21 16:01 henbos

For offer/answer, the relevant text is in section 7.1 of RFC 7587:

o The "stereo" parameter is a unidirectional receive-only parameter. When sending to a single destination, a sender MUST NOT use stereo when "stereo" is 0. Gateways or senders that are sending the same encoded audio to multiple destinations SHOULD NOT use stereo when "stereo" is 0, as this would lead to inefficient use of network resources. The "stereo" parameter does not affect interoperability.

As for what a missing "stereo=1" attribute means: Section 6.1:

stereo: specifies whether the decoder prefers receiving stereo or mono signals. Possible values are 1 and 0, where 1 specifies that stereo signals are preferred, and 0 specifies that only mono signals are preferred. Independent of the stereo parameter, every receiver MUST be able to receive and decode stereo signals, but sending stereo signals to a receiver that signaled a preference for mono signals may result in higher than necessary network utilization and encoding complexity. If no value is specified, the default is 0 (mono).

So receiving stereo is mandatory to support, but sending stereo when stereo=1 is missing is "allowed, but you really should have a strong reason to do so".

I think that once stereo is fully supported in the pipeline, we should put "stereo=1" into the default FMTP line for Opus.

alvestrand avatar Jan 18 '21 07:01 alvestrand

From what I know, stereo is fully supported in the pipeline. However, I'm not convinced that setting stereo=1 by default really solves this. As I see it, there are two problems

  1. To be able to produce stereo, the SDP needs to be munged (by adding stereo=1) before setLocalDescription.
  2. If the receiver does not want to receive stereo (sets stereo=0 or omits a value for stereo), the sender should heed that by re-doing the createOffer.

If I've understood this correctly, setting stereo=1 by default won't really solve any of the above, apart from not requiring any action when stereo should be sent. If mono should be sent, however, the problems still persist.

Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?

perahgren avatar Jan 18 '21 22:01 perahgren

While 1) would be solved if it becomes the default (stereo without munging) then the problem of wanting mono is seemingly the same as the original problem, only having the default reversed, so now one might SDP munge to get mono instead.

I think the reason why stereo=1 is more attractive than stereo=0 is that even if you negotiate stereo=1 this only sends stereo if the MediaStreamTrack is a stereo source. So in stereo=1 you can choose whether you want stereo or mono whereas with stereo=0 you have no choice, it's mono-only (assuming we want to respect the receiver's stereo=0).

So what I wonder is: how common are stereo sources, and is it easy to get mono from a stereo source? Can you simply do getUserMedia() with channelCount:1 or do you have to use WebAudio and if so how cumbersome and performance impactful is it to start using WebAudio? I do think the default should reflect what makes sense. But there is not just the default in SDP that comes to play here but also the default of MediaStreamTracks?

henbos avatar Jan 19 '21 08:01 henbos

Also, if we set stereo=1 by default, won't then suddenly all stereo-capable client that are not explicitly setting stereo=0 start sending stereo audio instead of mono?

Yes. Do we have any idea how common this is? (I don't...)

henbos avatar Jan 19 '21 08:01 henbos

only having the default reversed, so now one might SDP munge to get mono instead

That is up to the signalling layer which can legitimately strip out stereo=1 and/or replace it with stereo=0

fippo avatar Jan 19 '21 12:01 fippo

If the receiver does not want to receive stereo (sets stereo=0 or omits a value for stereo), the sender should heed that by re-doing the createOffer.

No? The receiver can declare stereo=0 independently. That is the "declarative" syntax stuff which is very confusing

fippo avatar Jan 19 '21 12:01 fippo

@henbos What was the resolution here? @fippo did you have some concerns?

jan-ivar avatar Feb 22 '21 19:02 jan-ivar

@fippo noted that Firefox already sets stereo=1, which I've confirmed.

jan-ivar avatar Feb 22 '21 19:02 jan-ivar

I do not think (as said quite a few comments up) there is any action for the WG here when it comes to chrome shipping stereo=1. That firefox already does so without the world coming to an end should be a good indicator.

fippo avatar Feb 22 '21 19:02 fippo

@fippo noted that Firefox already sets stereo=1, which I've confirmed.

Haha, well there we go.

And here's a modified version of @jan-ivar's fiddle that logs the channelCount: https://jsfiddle.net/wase5g6b/1/ When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox. I checked Safari too, but channelCount is undefined there. In any case, thee SDP is the same as Chromium.

@guidou Do you know what Chromium's default for channelCount is? I would assume that a normal microphone is 1 but I don't know what happens if you have some multichannel sound recording equipment.

I think if we write a PR to make stereo=1 the default we just need to make sure that Chromium does up and downsampling correctly, i.e. we don't want stereo=1 to cause us to encode stereo if our MediaStreamTrack has a mono source. @perahgren What would happen today?

henbos avatar Feb 23 '21 11:02 henbos

@henbos What was the resolution here?

I think we can just merge the PR (to be written) unless @perahgren or @guidou knows about any backwards compatibility issues that this would cause for cases where we do have multiple channels.

henbos avatar Feb 23 '21 11:02 henbos

Assuming this is ready for PR in the meantime, please correct me if I'm wrong.

henbos avatar Feb 23 '21 11:02 henbos

I think this may cause problems for Chrome when doing setRD for Opus with stereo=1 for two reasons:

  1. the Opus encoder will be initialized to transmit stereo regardless of the # of channels, which probably results in worse efficiency and sound quality for the selected bitrate
  2. if not set, the default bitrate will be increased from 32kbps to 64kbps to try to account for the need for stereo encoding. This will result in additional bandwidth consumption even for mono sources.

See https://chromium.googlesource.com/external/webrtc/stable/talk/+/d3ecbb30dc2684653d61e8ec88a5382aecf62773/media/webrtc/webrtcvoiceengine.cc#1892 for the relevant code.

Note also that stereo=1 has no effect on setLD.

juberti avatar Feb 24 '21 02:02 juberti

When I run it on my MacBook Pro I get channelCount:1 in both Chrome and Firefox.

@henbos MBP mics are mono (see Audio Midi Setup). FWIW my Logitech BRIO reports channelCount: 2 in Firefox above.

To test Chrome, I used an in-content device picker to pick my BRIO: I get channelCount: 1 with {audio: true} in M88, but channelCount: 2 in M90. Did the default change recently?

jan-ivar avatar Feb 24 '21 03:02 jan-ivar

I think this may cause problems for Chrome when doing setRD for Opus with stereo=1

@juberti Wouldn't that show up in a Chrome←→Firefox p2p call today then?

jan-ivar avatar Feb 24 '21 03:02 jan-ivar