srt
srt copied to clipboard
[BUG] rare crash in CRcvBuffer::~CRcvBuffer()
Describe the bug I were reported crash in CRcvBuffer::~CRcvBuffer() function.
/usr/bin/nimble(_Z14signal_handleri+0x71) [0x5684b1]
/lib64/libpthread.so.0(+0xf630) [0x7fa2e47d5630]
/lib64/libc.so.6(gsignal+0x37) [0x7fa2e3074387]
/lib64/libc.so.6(abort+0x148) [0x7fa2e3075a78]
/lib64/libc.so.6(+0x78ed7) [0x7fa2e30b6ed7]
/lib64/libc.so.6(+0x81299) [0x7fa2e30bf299]
/lib64/libsrt-nimble.so.1(_ZN10CRcvBufferD1Ev+0x46) [0x7fa2da3d0106]
/lib64/libsrt-nimble.so.1(_ZN4CUDTD1Ev+0xa0) [0x7fa2da3e06a0]
/lib64/libsrt-nimble.so.1(_ZN10CUDTSocketD1Ev+0x1c) [0x7fa2da3c623c]
/lib64/libsrt-nimble.so.1(_ZN10CUDTUnited12removeSocketEi+0x2d9) [0x7fa2da3c8739]
/lib64/libsrt-nimble.so.1(_ZN10CUDTUnited18checkBrokenSocketsEv+0x4fa) [0x7fa2da3c924a]
/lib64/libsrt-nimble.so.1(_ZN10CUDTUnited14garbageCollectEPv+0x50) [0x7fa2da3c9370]
/lib64/libpthread.so.0(+0x7ea5) [0x7fa2e47cdea5]
/lib64/libc.so.6(clone+0x6d) [0x7fa2e313c96d]
according to this call stack we failed on delete[] m_pUnit;
call in CRcvBuffer::~CRcvBuffer()
function.
I wish I have any other info right now but I will add new information once available
To Reproduce very rare issue to reproduce. I'll add new information when I get it. This problem reproduced when SRT library used in single process and both senders and receivers exist in this process.
- SRT Version / commit ID: 1.4.2
Hello, we are having the same issue. The crash happens randomly, hard to reproduce.
#0 0x00007f494892d438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f494892f03a in __GI_abort () at abort.c:89
#2 0x00007f494896f7fa in __libc_message (do_abort=2, fmt=fmt@entry=0x7f4948a88f98 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f49489767d3 in malloc_printerr (ar_ptr=0x7f4878000020, ptr=0x7f487800eb10, str=0x7f4948a85d52 "corrupted double-linked list", action=
@congchenutd do you have a callstack we can work with ?
What I can see weird in this call stack above is that both CUDT and CUDTSocket objects' addresses seem to be from the same domain, and the same as CUDTUnited object (which is global), while CRcvBuffer address seems to be from a completely different memory domain and its address is also very close to those things mentioned in the call stack around the system functions. Might be that being from a different domain is because CUDTBuffer
is being allocated quite probably in a receiver worker thread, not in the main thread (case of listener side and non-blocking mode). But a coincidence with the address of system functions is weird.
If you at least have some scenario in which it was confirmed at least for some cases, you might try with e.g. memory sanitizer.
Hi @alexpokotilo, @congchenutd. The old receiver buffer implementation has been replaced by the new one in v1.5.1. I am not completely sure if this issue depends on the implementation though, but I wonder if you still observe it with the latest SRT versions?
Hi @maxsharabayko, Please correct me, but new CRcvBuffer was introduced in https://github.com/Haivision/srt/releases/tag/v1.5.0 See "2. New Implementation of the Receiver Buffer" in "New Features and Improvements". If so, we have limited usage experience of new CRcvBuffer as our clients can use 1.5.0 but default is still 1.4.4. But I'm sure some of them use 1.5.0 and we don't get any crash reports about aforementioned panic. This doesn't mean panic is fixed in new 1.5.0 but apposite not proved as well. Please correct me in case something changed between 1.5 and 1.5.1 as we don't use 1.5.1 in production as of now.
Actually the CRcvBuffer
now is the new name of the previous CRcvBufferNew
that was coexisting for some time with the original CRcvBuffer
. Now this is the name of the new buffer and the old CRcvBuffer
implementation is completely deleted. Hard to say if this is repeatable (theoretically could be if this was coming from the code external to the buffer), although with the old buffer I was having lots of various crashes in case when I even slightly tried to modify or improve it. Might be that the old implementation contained some code doing memory override.
The question is when new implementation replaced the old one: in 1.5.0 or in 1.5.1 ?
In v1.5.0. The old was still available using the -DENABLE_NEW_RCVBUFFER=OFF
build option.
then we can close this ticket as we don't observe this issue with new implementation. If we find similar problem, I will file new ticket