srt icon indicating copy to clipboard operation
srt copied to clipboard

[BUG] Memory leak when using srt_transmit over a poor connection

Open amiller-isp opened this issue 3 years ago • 13 comments

Describe the bug The srt_transmit application appears to leak memory when the network connection between the two SRT endpoints is experiencing loss, jitter and reordering. This bug is specific to srt_transmit but my assumption is that the issue is in the SRT library not the application. Originally we saw this in a production environment using our custom SRT client, which is basically a wrapper around the srt_transmit code with some additional features we required for configuration and logging. I have reproduced the problem here using the srt_transmit application built directly from github. The netem settings used are not intended to be realistic, they simply reproduce the errors we saw during a network incident in our production environment.

To Reproduce Steps to reproduce the behavior:

  1. Configure network with tc to introduce loss, reordering and jitter on port 1235
dev=lo
send_port=1235
delay=200
jitter=200
loss=5
tc qdisc add dev ${dev} root handle 1: prio priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tc qdisc add dev ${dev} parent 1:2 handle 20: netem delay ${delay}ms ${jitter}ms reorder 25% 50% loss ${loss}%
tc filter add dev ${dev} parent 1:0 protocol all u32 match ip sport ${send_port} 0xffff flowid 1:2
  1. Run a send/recieve srt_transmit pair over port 1235. In our case the source on port 1234 is a custom application streaming a TS file. The sink on port 1236 is another custom application which consumes the file. I confirmed that the scenario works end-to-end as expected when the tc configuration is removed.
srt_transmit -chunk:1328 udp://:1234 'srt://:1235?mode=listener&latency=200'
srt_transmit -v -chunk:1328 'srt://localhost:1235?mode=caller&rcvlatency=200' 'udp://:1236'
  1. Error: The `srt_transmit process' RSS memory grows linearly over time and the log is full of errors and disconnect/connect notifications. srt_transmit.log

The production environment showed a memory growth of 1.35Gb/hr, whereas the repro here is showing ~0.12Gb/hr. However the source is not the same bitrate.

Expected behavior While the application may not be able to produce a reasonable output given the level of loss, jitter and reordering, it should not leak memory.

Screenshots See attached graph of RSS metric from ps generated using the output of ps -p ${pid} -o etimes:1=,vsz=,drs=,rss=,trs running at one second intervals. mem_srt_transmit_head

Desktop (please provide the following information):

  • OS: Linux Mint/Ubuntu 4.15.0-39-generic
  • SRT Version / commit ID: Reproduced on: v1.4.2, v1.4.3 and 87746453edcd37f2ae1987cfd052ca8f447c8738

Additional context I can provide further detail, logs etc. Please just ask.

amiller-isp avatar May 14 '21 01:05 amiller-isp

Hi @amiller-isp Does only the caller-receiver have the memory leak or sender-listener as well?

maxsharabayko avatar May 14 '21 09:05 maxsharabayko

Hi @maxsharabayko I reproduced the setup on MX4E (aarch64-linux-gnu) using my mxphub test app and do observe a very slow leak on the listener/sender, leading more in the lib than the app (as @amiller-isp suggested). If you find a way I can help with this let me know.

jeandube avatar May 14 '21 11:05 jeandube

There are sanitizers that could detect memory leaks and I think you have already tried some sanitizers during the work on thread problems.

ethouris avatar May 14 '21 11:05 ethouris

@ethouris Everything I've done during that period has been forgotten.

jeandube avatar May 14 '21 11:05 jeandube

@maxsharabayko - It looks like the sender-listener memory usage grows but stops after a short while and then remains constant, which is sort of the behavior I would expect. I'm working on verifying my data on this and will post it later. @ethouris - I considered running srt_transmit under Valgrind but I'm assuming that Valgrind will make things too slow to be useful.

amiller-isp avatar May 14 '21 19:05 amiller-isp

Memory graph for listener-sender. This seems to behave somewhat as I would expect. mem_srt_transmit_sender I also added the rates of memory growth we are seeing in the production environment to the main description:

The production environment showed a memory growth of 1.35Gb/hr, whereas the repro here is showing ~0.12Gb/hr. However the source is not the same bitrate.

amiller-isp avatar May 15 '21 15:05 amiller-isp

A small update. There seems to be an increase in memory consumption after reconnection of the "caller-receiver". 3.3 MB -> 3.7 MB -> 3.9 MB. During streaming itself memory consumption looks stable.

@amiller-isp According to your logs, you also have connection loss and reconnections. So that might be the reason and something to check for leaks.

maxsharabayko avatar Jun 07 '21 16:06 maxsharabayko

I've been seeing a memory leak of the same order of magnitude (1.25 MB/s in my case) that @amiller-isp reported. I'm using the gstreamer SRT sink plug-in rather than srt_transmit. In my scenario, the send worker thread is unable to keep up because of the volume of receivers subscribed to the same flow, which causes the packet loss to quickly increase until packets start getting dropped. The vtune memory-consumption trace I collected gives the following stack trace for the leaking memory:

libstdc++.so.6 ! operator new libsrt.so.1.4 ! CSndBuffer::increase + 0x90 libsrt.so.1.4 ! CSndBuffer::addBuffer + 0x3f libsrt.so.1.4 ! srt::CUDT::sendmsg2 + 0x83b libsrt.so.1.4 ! srt::CUDT::sendmsg2 + 0x39 libsrt.so.1.4 ! srt_sendmsg2 + 0x4e libgstsrt.so ! gst_srt_object_write_to_callers + 0x443 - gstsrtobject.c:1699 ...

I'm guessing there's a path where the packets to be send are getting forgotten about but not freed or recycled. If anyone has any suggestions on where and what I might add in order to fix the leak, I'm willing to take a stab at it. My attempts to figure it out on my own have been, well, not particularly fruitful.

Edit to add: This is with the 1.4.4 release of SRT.

billt-hlit avatar Mar 23 '22 00:03 billt-hlit

The sender buffer is associated with the socket. Once you close the socket, all this memory should be freed. If it's not, it is a leak. Note of course that the best way to make sure the socket is fully closed is to keep the application running for 2 more seconds after the call to srt_close and making sure that nothing was sent over this socket for about 2 seconds before it has been closed (the socket isn't physically closed as long as there's something pending to be sent in the sender buffer).

ethouris avatar Mar 23 '22 12:03 ethouris

The process gets killed for using up too much memory well before the socket gets closed. It only takes a few minutes before the process gets killed for using too much memory in the situation I describe (it's running in a container in the cloud), and even if I were to increase the allowed memory, I need the connection to stay up for weeks. Some sort of limit on the number of sender buffers that get allocated to a socket is needed.

billt-hlit avatar Mar 23 '22 19:03 billt-hlit

If we are talking about any "memory swelling" happening exactly here in the sender buffer, yes, this buffer grows dynamically, but not unlimited. The call to sndBuffersLeft() checks the limit that can be modified by SRTO_SNDBUF socket option. If this limit is reached, the CSndBuffer::addBuffer and followed CSndBuffer::increase will not be called, until at least one unit in the buffer gets released. The SRTO_SNDBUF option uses byte unit, so this way you can decide yourself how much of the memory you want to allow the application to use.

ethouris avatar Mar 24 '22 08:03 ethouris

(Sorry I haven't looked at this in a while) It seems like the default value of SRTO_SNDBUF is (8192 * SRT_PKT_SIZE), which is ~12Mb. We are not changing this value in our build. So this doesn't account for the growth we saw which was one hour memory grew to 2.41Gb from 1.06Gb = 1.35Gb or ~375kb/s. It seemed to max out at ~4Gb. However we've not seen this again in quite a while. I'm hoping to upgrade to 1.5 and then do some more production testing with it.

amiller-isp avatar Jun 24 '22 01:06 amiller-isp

If you suspect a leak, the best way to check it would be by using the LeakSanitizer:

https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer

For adding any flags to the compiler command, there's SRT_EXTRA_CFLAGS variable used in cmake build.

ethouris avatar Jun 24 '22 06:06 ethouris