mediasoup Memory optimizations

This heavy PR introduces a bunch of optimizations as mentioned in https://mediasoup.discourse.group/t/need-help-testing-significant-memory-optimizations/3302

It would be the best to review this commit by commit, otherwise all of the changes combined will be hard to follow.

LT;DR:

All of the changes together should result in ~6x memory usage reduction and ~3x reduction in calls to system allocator in a room with 5 participants each of which is sending 720p@30 video to each other, savings will grow with number of consumers.

Briefly about changes:

many data structures from std replaced with more efficient ones from abseil-cpp that we already depend on
some data structures are replaced with alternative implementation (like one-byte header extensions that use static array with 14 entries instead of std::map resulting in huge savings on system allocator calls)
some data structures that call system allocator frequently are now reusing pre-allocated memory from a pool
RTP video packets are no longer cloned for every consumer and are stored for shorter period of time (for RTX purposes) derived from RTT time (following similar logic in Janus)

I'm not great at C++, so I'm sure APIs can be made better, but that is above my head at the moment.

Also this needs really careful review and preferably some extra testing even though I have tested it extensively in local environment.

Oct 11 '21 13:10 nazar-pc

Please patience 😅 I'll review this week. And tonight I'll publish a new release with Meson

Oct 11 '21 14:10 ibc

Fixed memory leak accidentally introduced in https://github.com/nazar-pc/mediasoup/pull/5 and merged latest v3 back into this branch

Jan 06 '22 23:01 nazar-pc

Definitely will review this great work this week.

Jan 10 '22 09:01 ibc

We should merge this. Is everyone happy with this PR?

Feb 03 '22 16:02 ibc

I'm still trying to figure out with Paulo Lanzarin an interesting memory leak that only happens for worker that lives for a very long time (days), please wait until we figure it out.

Feb 03 '22 16:02 nazar-pc

I'm still trying to figure out with Paulo Lanzarin an interesting memory leak that only happens for worker that lives for a very long time (days), please wait until we figure it out.

Commenting this here because it may not be solely related to this PR, we are seeing some potential memory leak in v3 to.

Need to investigate further, but this are the valgrind logs.

valgrind.txt

Feb 10 '22 17:02 jmillan

Commenting this here because it may not be solely related to this PR, we are seeing some potential memory leak in v3 to.

I don't think those are related as comparison was done against v3 branch (release), but still potentially possible.

Feb 10 '22 21:02 nazar-pc

@nazar-pc I start mediasoup with valgrind --tool=massif <executable> and found leak of rtp packets. Massif data

You can visualize it with ms_print <file> or massif-visualizer <file>

Feb 11 '22 09:02 Hartigan

Thanks @Hartigan, I'll take a look closer to the end of this week

Feb 11 '22 09:02 nazar-pc

@Hartigan is there a way for me to reproduce it myself?

Feb 13 '22 11:02 nazar-pc

@nazar-pc It's custom service written on rust. I created 4 echo participants((1 router + 1 webrtc transport + 2 producer + 2 consumer) * 4) and collect valgrind dump.

Feb 13 '22 15:02 Hartigan

@Hartigan are those routers in the same worker? Are those running over the network or locally, do they do anything special, or you just start them and wait for memory usage to grow?

Feb 15 '22 07:02 nazar-pc

@nazar-pc I run locally in docker. Number of router no matter. I knew that there was a leak and started several routers to speed up the leak.

Feb 15 '22 07:02 Hartigan

How long does it take for things to leak? I was running for tenth of minutes and had constant memory usage. Also which browsers did you use? As soon as I can reproduce it, fix should be fairly quick.

Feb 15 '22 08:02 nazar-pc

Chromium Version 98.0.4758.80 (Official Build) Arch Linux (64-bit)
Single transport on send and receive in echo room
We don't use mediasoup's client libs.
Media: vp8 with simulcast(3 layers) + opus
I ran the test overnight => estimated time < 8h.

Today I gonna try to reproduce on that sample: https://github.com/versatica/mediasoup/blob/v3/rust/examples/echo.rs

Feb 15 '22 08:02 Hartigan

Probably related, even the same, I've found a leak in RtpStreamSend retransmission buffer. The problem is within storage and buffer. I'll be checking it tomorrow hopefully.

Update: the leak is in v3.

Feb 15 '22 18:02 jmillan

@jmillan yes, it's same problem

Feb 15 '22 18:02 Hartigan

Retransmission buffer logic is rewritten in this PR, but I'm still curious what it is :slightly_smiling_face:

Feb 15 '22 19:02 nazar-pc

Retransmission buffer logic is rewritten in this PR, but I'm still curious what it is :slightly_smiling_face:

Problem is that another leak is also detected in this PR, right? So we don't know yet if it's the same even if the logic changes.

Feb 15 '22 20:02 ibc

@nazar-pc I prepared draft version of StorageItemBuffer based on std::array with reusable buckets. Size of bucket based on sqrt-decomposition practice. My estimation: 256 packets ~ 1 sec 720p vp8 video. https://gist.github.com/Hartigan/c29f20ab261fb7983103fa9d28dfafd6

Feb 15 '22 22:02 Hartigan

@Hartigan why though? This PR already optimizes it and doesn't require fixed buffers at all (and saves a lot of RAM because of that).

Feb 16 '22 11:02 nazar-pc

@nazar-pc

std::deque implementation based on fixed-size buffers allocations :D
I wanna provide solution with reusable(pooled) buffers and simple navigation(without this->startSeq calculations)
In my case Insert, Remove, Get has guaranteed O(1) complexity. In current implementation Insert and Remove do many pop_back/pop_front and push_back/push_front staff which can cause allocations of fixed-size buffers in std::deque.

Feb 16 '22 12:02 Hartigan

Hihi,

Crossposting here just in case we are dealing with the same base issue. The leak I found is quite crazy: https://github.com/versatica/mediasoup/issues/769#issuecomment-1041474158

Feb 16 '22 16:02 jmillan

@nazar-pc I create pull request https://github.com/nazar-pc/mediasoup/pull/6 with changes for retransmission buffer. I collected new massif dump: 4 echo rooms with 720p VP8 simulcast + opus and 8 hours Leak of cloned RTP packets gone and memory usage <8mb (total usage of application with mediasoup library).

Feb 17 '22 08:02 Hartigan

std::deque was used specifically to avoid reallocations in most cases. push/pop then is basically numeric increment/decrement unless I don't understand it correctly. Deque will grow as needed, but only if needed, saving memory. Of course allocating fixed buffers will save allocations, but it needs to be measured if it actually makes a difference.

Feb 17 '22 21:02 nazar-pc

@nazar-pc,

About the ObjectPool, we are using it mainly as thread static which means that its lifetime is the same as the thread's lifetime. ObjectPool will deallocate memory only on destruction so a pool will have, for the whole lifetime of the thread, as much allocated memory as it was required in the moment of maximum demand.

This is part of the pattern, I just bring this up because honestly I have just realised.

Feb 18 '22 08:02 jmillan

@jmillan yes, those pools are for the whole worker, making it more efficient in terms of memory usage comparing to having them per-consumer for instance. I think it wouldn't be a bad idea to have a call and for me to explain what kind of changes I did and why so we are all on the same page since this is a really big PR.

Feb 18 '22 08:02 nazar-pc

@nazar-pc @jmillan I think I figured out problems:

https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L459 In long test cause wrong comparsion of timestamps => buffer not cleaned when its needed.
https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L41 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L59 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L70 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L74 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L103

MaxSeq used, but should be SeqRange. That problem broke index in long tests To reproduce problems need run long tests when sequence and timestamp pass over upper limit.

Feb 18 '22 14:02 Hartigan

@nazar-pc @jmillan I think I figured out problems:

Makes sense @Hartigan 👍 👏

Feb 18 '22 14:02 jmillan

Thanks for the hints, @Hartigan, there was incorrect handling for timestamps when they wrap around. Latest commit should resolve this. I was able to reproduce the issue with videoroom example app and after fixes memory doesn't seem to leak anymore as far as I can tell.

As a sidenote I was also surprised that timestamps on consumers start with 1 instead of random number, makes testing slower that would have been otherwise.

cc @prlanzarin

Feb 19 '22 12:02 nazar-pc

mediasoup mediasoup copied to clipboard

Memory optimizations

LT;DR:

mediasoup
mediasoup copied to clipboard