srt
srt copied to clipboard
[BUG] Memory leak when using srt_transmit over a poor connection
Describe the bug
The srt_transmit
application appears to leak memory when the network connection between the two SRT endpoints is experiencing loss, jitter and reordering. This bug is specific to srt_transmit
but my assumption is that the issue is in the SRT library not the application.
Originally we saw this in a production environment using our custom SRT client, which is basically a wrapper around the srt_transmit code with some additional features we required for configuration and logging. I have reproduced the problem here using the srt_transmit
application built directly from github. The netem
settings used are not intended to be realistic, they simply reproduce the errors we saw during a network incident in our production environment.
To Reproduce Steps to reproduce the behavior:
- Configure network with
tc
to introduce loss, reordering and jitter on port 1235
dev=lo
send_port=1235
delay=200
jitter=200
loss=5
tc qdisc add dev ${dev} root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tc qdisc add dev ${dev} parent 1:2 handle 20: netem delay ${delay}ms ${jitter}ms reorder 25% 50% loss ${loss}%
tc filter add dev ${dev} parent 1:0 protocol all u32 match ip sport ${send_port} 0xffff flowid 1:2
- Run a send/recieve
srt_transmit pair
over port 1235. In our case the source on port 1234 is a custom application streaming a TS file. The sink on port 1236 is another custom application which consumes the file. I confirmed that the scenario works end-to-end as expected when thetc
configuration is removed.
srt_transmit -chunk:1328 udp://:1234 'srt://:1235?mode=listener&latency=200'
srt_transmit -v -chunk:1328 'srt://localhost:1235?mode=caller&rcvlatency=200' 'udp://:1236'
- Error: The `srt_transmit process' RSS memory grows linearly over time and the log is full of errors and disconnect/connect notifications. srt_transmit.log
The production environment showed a memory growth of 1.35Gb/hr, whereas the repro here is showing ~0.12Gb/hr. However the source is not the same bitrate.
Expected behavior While the application may not be able to produce a reasonable output given the level of loss, jitter and reordering, it should not leak memory.
Screenshots
See attached graph of RSS metric from ps
generated using the output of ps -p ${pid} -o etimes:1=,vsz=,drs=,rss=,trs
running at one second intervals.
Desktop (please provide the following information):
- OS: Linux Mint/Ubuntu 4.15.0-39-generic
- SRT Version / commit ID: Reproduced on: v1.4.2, v1.4.3 and 87746453edcd37f2ae1987cfd052ca8f447c8738
Additional context I can provide further detail, logs etc. Please just ask.
Hi @amiller-isp Does only the caller-receiver have the memory leak or sender-listener as well?
Hi @maxsharabayko I reproduced the setup on MX4E (aarch64-linux-gnu) using my mxphub test app and do observe a very slow leak on the listener/sender, leading more in the lib than the app (as @amiller-isp suggested). If you find a way I can help with this let me know.
There are sanitizers that could detect memory leaks and I think you have already tried some sanitizers during the work on thread problems.
@ethouris Everything I've done during that period has been forgotten.
@maxsharabayko - It looks like the sender-listener memory usage grows but stops after a short while and then remains constant, which is sort of the behavior I would expect. I'm working on verifying my data on this and will post it later. @ethouris - I considered running srt_transmit under Valgrind but I'm assuming that Valgrind will make things too slow to be useful.
Memory graph for listener-sender. This seems to behave somewhat as I would expect.
I also added the rates of memory growth we are seeing in the production environment to the main description:
The production environment showed a memory growth of 1.35Gb/hr, whereas the repro here is showing ~0.12Gb/hr. However the source is not the same bitrate.
A small update. There seems to be an increase in memory consumption after reconnection of the "caller-receiver". 3.3 MB -> 3.7 MB -> 3.9 MB. During streaming itself memory consumption looks stable.
@amiller-isp According to your logs, you also have connection loss and reconnections. So that might be the reason and something to check for leaks.
I've been seeing a memory leak of the same order of magnitude (1.25 MB/s in my case) that @amiller-isp reported. I'm using the gstreamer SRT sink plug-in rather than srt_transmit. In my scenario, the send worker thread is unable to keep up because of the volume of receivers subscribed to the same flow, which causes the packet loss to quickly increase until packets start getting dropped. The vtune memory-consumption trace I collected gives the following stack trace for the leaking memory:
libstdc++.so.6 ! operator new libsrt.so.1.4 ! CSndBuffer::increase + 0x90 libsrt.so.1.4 ! CSndBuffer::addBuffer + 0x3f libsrt.so.1.4 ! srt::CUDT::sendmsg2 + 0x83b libsrt.so.1.4 ! srt::CUDT::sendmsg2 + 0x39 libsrt.so.1.4 ! srt_sendmsg2 + 0x4e libgstsrt.so ! gst_srt_object_write_to_callers + 0x443 - gstsrtobject.c:1699 ...
I'm guessing there's a path where the packets to be send are getting forgotten about but not freed or recycled. If anyone has any suggestions on where and what I might add in order to fix the leak, I'm willing to take a stab at it. My attempts to figure it out on my own have been, well, not particularly fruitful.
Edit to add: This is with the 1.4.4 release of SRT.
The sender buffer is associated with the socket. Once you close the socket, all this memory should be freed. If it's not, it is a leak. Note of course that the best way to make sure the socket is fully closed is to keep the application running for 2 more seconds after the call to srt_close
and making sure that nothing was sent over this socket for about 2 seconds before it has been closed (the socket isn't physically closed as long as there's something pending to be sent in the sender buffer).
The process gets killed for using up too much memory well before the socket gets closed. It only takes a few minutes before the process gets killed for using too much memory in the situation I describe (it's running in a container in the cloud), and even if I were to increase the allowed memory, I need the connection to stay up for weeks. Some sort of limit on the number of sender buffers that get allocated to a socket is needed.
If we are talking about any "memory swelling" happening exactly here in the sender buffer, yes, this buffer grows dynamically, but not unlimited. The call to sndBuffersLeft()
checks the limit that can be modified by SRTO_SNDBUF
socket option. If this limit is reached, the CSndBuffer::addBuffer
and followed CSndBuffer::increase
will not be called, until at least one unit in the buffer gets released. The SRTO_SNDBUF
option uses byte unit, so this way you can decide yourself how much of the memory you want to allow the application to use.
(Sorry I haven't looked at this in a while)
It seems like the default value of SRTO_SNDBUF
is (8192 * SRT_PKT_SIZE)
, which is ~12Mb. We are not changing this value in our build. So this doesn't account for the growth we saw which was one hour memory grew to 2.41Gb from 1.06Gb = 1.35Gb or ~375kb/s. It seemed to max out at ~4Gb.
However we've not seen this again in quite a while. I'm hoping to upgrade to 1.5 and then do some more production testing with it.
If you suspect a leak, the best way to check it would be by using the LeakSanitizer:
https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer
For adding any flags to the compiler command, there's SRT_EXTRA_CFLAGS
variable used in cmake build.