mediasoup icon indicating copy to clipboard operation
mediasoup copied to clipboard

Backport changes to congestion control from libwebrtc

Open sarumjanuch opened this issue 2 years ago • 34 comments

This MR will track progress of sync libwebrtc congestion control in mediasoup to catchup with all the changes for the last 2 years, and most importantly- loss based estimator v2

sarumjanuch avatar Oct 10 '22 09:10 sarumjanuch

Amazing. Just to clarify, this PR:

  • Updates libwebrtc code related to TCC and so on.
  • Adds new loss based estimator v2 but doesn't use it yet.

Is it correct?

ibc avatar Oct 10 '22 09:10 ibc

That is 100% correct. This is very WIP, but first commits that compiles. I will continue working on this.

sarumjanuch avatar Oct 10 '22 09:10 sarumjanuch

Wondering if we should mark the PR as draft to avoid wrong expectations for other readers. Not critical at all.

ibc avatar Oct 10 '22 09:10 ibc

Feel free to modify as needed, my laptop already died. Dunno when i ll return:(

On Mon, Oct 10, 2022, 12:51 Iñaki Baz Castillo @.***> wrote:

Wondering if we should mark the PR as draft to avoid wrong expectations for other readers. Not critical at all.

— Reply to this email directly, view it on GitHub https://github.com/versatica/mediasoup/pull/922#issuecomment-1273060364, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD342YXRD2F3UVBGU5HN3DWCPRJXANCNFSM6AAAAAARBFD2XA . You are receiving this because you authored the thread.Message ID: @.***>

sarumjanuch avatar Oct 10 '22 10:10 sarumjanuch

Converted to draft

jmillan avatar Oct 10 '22 14:10 jmillan

@sarumjanuch I'll commit above cosmetic suggestions 😀

ibc avatar Dec 01 '22 11:12 ibc

@sarumjanuch libwebrtcFieldTrials PR is merged in v3. Please, when appropriate follow steps at https://github.com/versatica/mediasoup/pull/960#issuecomment-1332517890

ibc avatar Dec 01 '22 11:12 ibc

Apart from upgrading the libwebrtc version maybe you also want to include this bugfix in this PR: https://github.com/versatica/mediasoup/pull/962

ggarber avatar Dec 02 '22 09:12 ggarber

Hello @ggarber! The goal of back-porting was not just to simply upgrade libwebrtc version, but use Loss V2 estimator, multiple bugfixes was a bonus. When loss v2 estimator is enabled RR packets loss is ignored, loss v2 operates purely on twcc feedback. But, we also did tests with feedback_only flag month ago, and it was producing worse results. This was my graphs: image

I will try it with new loss estimator. Because right now we have beautiful graphs when loss limited, and still bad when delay limited.

sarumjanuch avatar Dec 02 '22 12:12 sarumjanuch

@ggarber You are more than welcome to checkout this branch and do testing and playing with params, if you want, you can contact me on telegram, viber, whatsapp: +380630593550. I would explain you the road we went so far, and some params of new loss estimator. The more eyes we will have here, the better will be the result.

sarumjanuch avatar Dec 02 '22 12:12 sarumjanuch

@sarumjanuch libwebrtcFieldTrials PR is merged in v3. Please, when appropriate follow steps at #960 (comment)

@ibc Done.

sarumjanuch avatar Dec 02 '22 12:12 sarumjanuch

Because right now we have beautiful graphs when loss limited, and still bad when delay limited

Hi @sarumjanuch, can you briefly describe the scenario you are using for testing? Thanks

vpalmisano avatar Dec 02 '22 22:12 vpalmisano

Hello @vpalmisano! So initially we started testing with artificial congestion created by traffic shapers, but they are mostly useless, because they do not reflect how the real network queues behaves. Google CC algorithm relies on two main characteristics: packet loss and delay gradient values linear regression slope. It is very hard to simulate those with traffic shapers as main assumption is that either delay or packet loss start increasing near the top of our link capacity, and therefore turned into a signal to hold or decrease BW estimate. Traffic shapers create sudden bursts on top of the link capacity, in form of loss, because mostly they are simple leaky bucket implementation. GCC will just produce sawtooth signal for such circumstances, because it would be limited only by burst loss. The only useful scenario in such cases is to observer how GCC behaves in random loss environment. That said, our main goal to test GCC is to create real congestion and see how it behaves under this circumstances. To easily reproduce this scenario Jose created special branch https://github.com/versatica/mediasoup-demo/tree/jmillan/consumerReplicas. Which allow you to run a single page with producer and then run consumer page, where you would be able to specify enough consumer replicas to create congestion in your downlink. I personally have a separate server on the wild internet where i run mentioned mediasoup demo branch, and I am connecting to it from a different networks all over the place. So for example in my home network BW is limited by loss, on top of link capacity small loss noise start emerging, and los v2 estimator catches it very beautifully. On the other network that I am testing BW is Delay limited, and it is where i am having issues right now that i am trying to fix. So in short in my case I run two browser tabs on a separate machines, where one is pure isolated producer, and second is pure isolated consumer, that are connected to external server, and are sending traffic over the real network, then i pick consumerReplicas value big enough for GCC being continuously congested, and start sampling values for my charts. So two tabs url params might look like this: Consumer: https://YOUR_SERVER/?roomId=test&produce=false&consumerReplicas=19 Producer: https://YOUR_SERVER/?roomId=test&consumer=false&consumerReplicas=19

sarumjanuch avatar Dec 03 '22 11:12 sarumjanuch

Thanks for the explanation. It I've correctly understood, you are running:

  • 1 single producer at HD resolution on machine A;
  • 19 consumers running on machine B, consuming the same producer sent from A;
  • both machines A and B run under your home network. Is that correct?

vpalmisano avatar Dec 03 '22 11:12 vpalmisano

Now few words about importance of Loss v2 estimator. Loss v2 is a giant leap forward in comparison what GCC was doing before. Prior it loss adjustments to BW estimate were calculated here: https://github.com/sarumjanuch/mediasoup/blob/bwe_backport/worker/deps/libwebrtc/libwebrtc/modules/bitrate_controller/send_side_bandwidth_estimation.cc#L522. Very simple and rudimentary formulas. This is how chart looked like prior Loss v2 when link capacity is loss limited: image This is because loss is not differentiated. Main assumption for Loss v2 is that loss can be separated in two categories: Instant Loss and Inherent Loss. Instant loss is basically bursts and Inherent loss is the one that is Inherent for a given link at certain period of time, both has limits, both by default 5% so over 5% of those limits means we should decrease, and when near top of it we should hold. Loss v2 uses maximum likelihood estimation approach inside. Also it has a meaning candidates and their factors to pick BW estimate from. It is integrated with Delay estimator, Trendline estimator, and Probe estimator (not yet in this branch). And it produces the minimum of three values. Loss v2 estimator is currently active experiment in chrome, you can check it by taking a look at your filed trials. And is still evolving. Unfortunately it is not described at any papers, nor has documentation, so we are kinda in a wild west here. With loss v2 estimator, loss limited downlink estimation looks like: image And here Is the mix of loss and delay estimate, you can clearly see straight lines where there is loss present, still delay based estimation is far from being perfect: image

sarumjanuch avatar Dec 03 '22 12:12 sarumjanuch

Thanks for the explanation. It I've correctly understood, you are running:

  • 1 single producer at HD resolution on machine A;
  • 19 consumers running on machine B, consuming the same producer sent from A;
  • both machines A and B runs under your home network. Is that correct?

Generally yes. But they are connected to the outside server (I am in Ukraine, server in Frankfurt). But it is very different range of networks all over the place, and they all behave slightly different. As you may know we are having issues with electricity in Ukraine, because of war with Russia. So am testing this all over the place where I land with my laptop (different flavours of Wifi, 4G, 3G, Starlink, fiber, copper).

sarumjanuch avatar Dec 03 '22 12:12 sarumjanuch

What are the minimum, maximum, average and total loss rates for both cases? It seems to me that the loss rates using the new version are higher. Did you made a comparison running a concurrent TCP download?

vpalmisano avatar Dec 03 '22 16:12 vpalmisano

You can see loss percent (blue line) as a second column on y axis on a first chart, at every screenshot. Avarage loss rates might be higher, in case of loss limited BW estimate as loss v2 estimator allow certian amoun of random loss (Inherent loss). You could tune that param by modifying fieldtrial string, InherentLossUpperBoundOffset - default 0.05, meaning that estiamtor allows up to 5% random loss. There is an API now to pass and override default fieldTrial string right now. No I havent tested this yet in compeeting scenario.

sarumjanuch avatar Dec 03 '22 16:12 sarumjanuch

@sarumjanuch awesome work, I think it will improve the overall quality a lot! We will try to deploy it to a relatively separate grid next week and test it with various scenarios. Can you give an example of how to manipulate the fieldTrial string so we can have multiple scenarios?

balazskreith avatar Dec 04 '22 08:12 balazskreith

@balazskreith please leave multiple comments as review instead of individual comments next time, it helps with email notifications a lot.

nazar-pc avatar Dec 04 '22 09:12 nazar-pc

Can you give an example of how to manipulate the fieldTrial string so we can have multiple scenarios?

createWorker has a new param now, which allow you to override: https://github.com/versatica/mediasoup/blob/b9a11610c7703f8ddf94058c8d1dbc07962f9487/node/src/Worker.ts#L75

The format is libwebrtc dictated ${FLAG}${DELIMITER}${FLAG_PARAMS}${DELIMITER}.

sarumjanuch avatar Dec 04 '22 13:12 sarumjanuch

@sarumjanuch hi, thanks for the summary of Loss-Based BWE findings. Current WebRTC's estimator will collapse to 100 kbps if there's a 10% loss for a long period of time, so it's great to see that with Loss-Based BWE estimator can maintain 40 mbps value and not drop. Looking forward to Google making this experiment a default behavior.

50ed9689-2091-40c7-b1b0-345cb9d599bd

vsviatetskyi avatar Dec 13 '22 11:12 vsviatetskyi

Hello @vsviatetskyi, I believe i have seen your questions on google webrtc groups. New loss estimator can be adjusted to tolerate this by changing this flags over filed trial setting when creating new worker worker. Same you can do by adjusting field trial string when running chrome. This one controls inherent loss: https://github.com/sarumjanuch/mediasoup/blob/bwe_backport/worker/deps/libwebrtc/libwebrtc/modules/bitrate_controller/loss_based_bwe_v2.cc#L310 This one control instant bursts (average loss over observation period window): https://github.com/sarumjanuch/mediasoup/blob/bwe_backport/worker/deps/libwebrtc/libwebrtc/modules/bitrate_controller/loss_based_bwe_v2.cc#L329 Both by default is 5% You can play with this params to see how the suits your use case, please share you experience.

sarumjanuch avatar Dec 13 '22 12:12 sarumjanuch

@ggarber After additional testing and this branch includes fix from #962.

sarumjanuch avatar Dec 19 '22 17:12 sarumjanuch

For the curious: Loss v2 Estimator successfully passed and will be enabled by default in some next chrome releases: https://webrtc.googlesource.com/src/+/e04726281c3110718f5df9d4e8780229a23234e6%5E%21/

sarumjanuch avatar Dec 23 '22 14:12 sarumjanuch

Is there a timeline for when this PR can be merged into the mainline code? For the past 3 weeks, not had much activity, Is anything that I can be of any help with?

umeshc avatar Jan 11 '23 09:01 umeshc

@umeshc That is github showing wrong info, i was committing code yesterday, and will commit today. We are very actively testing everything to pick right params, don't worry, it will be merged as soon as we will be confident in stability and satisfied with overall results.

sarumjanuch avatar Jan 11 '23 10:01 sarumjanuch

As of right now, this is blocked by #989

sarumjanuch avatar Jan 26 '23 12:01 sarumjanuch

May want to see this https://github.com/versatica/mediasoup/pull/1031

ibc avatar Mar 24 '23 14:03 ibc

Do you have any update/plan for this PR? Thanks!

vpalmisano avatar Apr 24 '23 09:04 vpalmisano