mediasoup
mediasoup copied to clipboard
Memory optimizations
This heavy PR introduces a bunch of optimizations as mentioned in https://mediasoup.discourse.group/t/need-help-testing-significant-memory-optimizations/3302
It would be the best to review this commit by commit, otherwise all of the changes combined will be hard to follow.
LT;DR:
All of the changes together should result in ~6x memory usage reduction and ~3x reduction in calls to system allocator in a room with 5 participants each of which is sending 720p@30 video to each other, savings will grow with number of consumers.
Briefly about changes:
- many data structures from
std
replaced with more efficient ones fromabseil-cpp
that we already depend on - some data structures are replaced with alternative implementation (like one-byte header extensions that use static array with 14 entries instead of
std::map
resulting in huge savings on system allocator calls) - some data structures that call system allocator frequently are now reusing pre-allocated memory from a pool
- RTP video packets are no longer cloned for every consumer and are stored for shorter period of time (for RTX purposes) derived from RTT time (following similar logic in Janus)
I'm not great at C++, so I'm sure APIs can be made better, but that is above my head at the moment.
Also this needs really careful review and preferably some extra testing even though I have tested it extensively in local environment.
Please patience 😅 I'll review this week. And tonight I'll publish a new release with Meson
Fixed memory leak accidentally introduced in https://github.com/nazar-pc/mediasoup/pull/5 and merged latest v3 back into this branch
Definitely will review this great work this week.
We should merge this. Is everyone happy with this PR?
I'm still trying to figure out with Paulo Lanzarin an interesting memory leak that only happens for worker that lives for a very long time (days), please wait until we figure it out.
I'm still trying to figure out with Paulo Lanzarin an interesting memory leak that only happens for worker that lives for a very long time (days), please wait until we figure it out.
Commenting this here because it may not be solely related to this PR, we are seeing some potential memory leak in v3 to.
Need to investigate further, but this are the valgrind logs.
Commenting this here because it may not be solely related to this PR, we are seeing some potential memory leak in v3 to.
I don't think those are related as comparison was done against v3 branch (release), but still potentially possible.
@nazar-pc
I start mediasoup with valgrind --tool=massif <executable>
and found leak of rtp packets.
Massif data
You can visualize it with ms_print <file>
or massif-visualizer <file>
Thanks @Hartigan, I'll take a look closer to the end of this week
@Hartigan is there a way for me to reproduce it myself?
@nazar-pc It's custom service written on rust. I created 4 echo participants((1 router + 1 webrtc transport + 2 producer + 2 consumer) * 4) and collect valgrind dump.
@Hartigan are those routers in the same worker? Are those running over the network or locally, do they do anything special, or you just start them and wait for memory usage to grow?
@nazar-pc I run locally in docker. Number of router no matter. I knew that there was a leak and started several routers to speed up the leak.
How long does it take for things to leak? I was running for tenth of minutes and had constant memory usage. Also which browsers did you use? As soon as I can reproduce it, fix should be fairly quick.
- Chromium Version 98.0.4758.80 (Official Build) Arch Linux (64-bit)
- Single transport on send and receive in echo room
- We don't use mediasoup's client libs.
- Media: vp8 with simulcast(3 layers) + opus
- I ran the test overnight => estimated time < 8h.
Today I gonna try to reproduce on that sample: https://github.com/versatica/mediasoup/blob/v3/rust/examples/echo.rs
Probably related, even the same, I've found a leak in RtpStreamSend
retransmission buffer. The problem is within storage
and buffer
. I'll be checking it tomorrow hopefully.
Update: the leak is in v3.
@jmillan yes, it's same problem
Retransmission buffer logic is rewritten in this PR, but I'm still curious what it is :slightly_smiling_face:
Retransmission buffer logic is rewritten in this PR, but I'm still curious what it is :slightly_smiling_face:
Problem is that another leak is also detected in this PR, right? So we don't know yet if it's the same even if the logic changes.
@nazar-pc I prepared draft version of StorageItemBuffer
based on std::array
with reusable buckets. Size of bucket based on sqrt-decomposition practice. My estimation: 256 packets ~ 1 sec 720p vp8 video.
https://gist.github.com/Hartigan/c29f20ab261fb7983103fa9d28dfafd6
@Hartigan why though? This PR already optimizes it and doesn't require fixed buffers at all (and saves a lot of RAM because of that).
@nazar-pc
- std::deque implementation based on fixed-size buffers allocations :D
- I wanna provide solution with reusable(pooled) buffers and simple navigation(without
this->startSeq
calculations) - In my case
Insert
,Remove
,Get
has guaranteedO(1)
complexity. In current implementationInsert
andRemove
do manypop_back/pop_front
andpush_back/push_front
staff which can cause allocations of fixed-size buffers instd::deque
.
Hihi,
Crossposting here just in case we are dealing with the same base issue. The leak I found is quite crazy: https://github.com/versatica/mediasoup/issues/769#issuecomment-1041474158
@nazar-pc I create pull request https://github.com/nazar-pc/mediasoup/pull/6 with changes for retransmission buffer. I collected new massif dump: 4 echo rooms with 720p VP8 simulcast + opus and 8 hours Leak of cloned RTP packets gone and memory usage <8mb (total usage of application with mediasoup library).
std::deque
was used specifically to avoid reallocations in most cases. push/pop then is basically numeric increment/decrement unless I don't understand it correctly. Deque will grow as needed, but only if needed, saving memory. Of course allocating fixed buffers will save allocations, but it needs to be measured if it actually makes a difference.
@nazar-pc,
About the ObjectPool
, we are using it mainly as thread static
which means that its lifetime is the same as the thread's lifetime. ObjectPool
will deallocate memory only on destruction so a pool will have, for the whole lifetime of the thread, as much allocated memory as it was required in the moment of maximum demand.
This is part of the pattern, I just bring this up because honestly I have just realised.
@jmillan yes, those pools are for the whole worker, making it more efficient in terms of memory usage comparing to having them per-consumer for instance. I think it wouldn't be a bad idea to have a call and for me to explain what kind of changes I did and why so we are all on the same page since this is a really big PR.
@nazar-pc @jmillan I think I figured out problems:
- https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L459 In long test cause wrong comparsion of timestamps => buffer not cleaned when its needed.
- https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L41 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L59 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L70 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L74 https://github.com/nazar-pc/mediasoup/blob/memory-optimizations/worker/src/RTC/RtpStreamSend.cpp#L103
MaxSeq
used, but should be SeqRange
. That problem broke index in long tests
To reproduce problems need run long tests when sequence
and timestamp
pass over upper limit.
@nazar-pc @jmillan I think I figured out problems:
Makes sense @Hartigan 👍 👏
Thanks for the hints, @Hartigan, there was incorrect handling for timestamps when they wrap around. Latest commit should resolve this. I was able to reproduce the issue with videoroom example app and after fixes memory doesn't seem to leak anymore as far as I can tell.
As a sidenote I was also surprised that timestamps on consumers start with 1 instead of random number, makes testing slower that would have been otherwise.
cc @prlanzarin