srt icon indicating copy to clipboard operation
srt copied to clipboard

[BUG] rare crash in CRcvBuffer::~CRcvBuffer()

Open alexpokotilo opened this issue 4 years ago • 3 comments

Describe the bug I were reported crash in CRcvBuffer::~CRcvBuffer() function.

    /usr/bin/nimble(_Z14signal_handleri+0x71) [0x5684b1]
    /lib64/libpthread.so.0(+0xf630) [0x7fa2e47d5630]
    /lib64/libc.so.6(gsignal+0x37) [0x7fa2e3074387]
    /lib64/libc.so.6(abort+0x148) [0x7fa2e3075a78]
    /lib64/libc.so.6(+0x78ed7) [0x7fa2e30b6ed7]
    /lib64/libc.so.6(+0x81299) [0x7fa2e30bf299]
    /lib64/libsrt-nimble.so.1(_ZN10CRcvBufferD1Ev+0x46) [0x7fa2da3d0106]
    /lib64/libsrt-nimble.so.1(_ZN4CUDTD1Ev+0xa0) [0x7fa2da3e06a0]
    /lib64/libsrt-nimble.so.1(_ZN10CUDTSocketD1Ev+0x1c) [0x7fa2da3c623c]
    /lib64/libsrt-nimble.so.1(_ZN10CUDTUnited12removeSocketEi+0x2d9) [0x7fa2da3c8739]
    /lib64/libsrt-nimble.so.1(_ZN10CUDTUnited18checkBrokenSocketsEv+0x4fa) [0x7fa2da3c924a]
    /lib64/libsrt-nimble.so.1(_ZN10CUDTUnited14garbageCollectEPv+0x50) [0x7fa2da3c9370]
    /lib64/libpthread.so.0(+0x7ea5) [0x7fa2e47cdea5]
    /lib64/libc.so.6(clone+0x6d) [0x7fa2e313c96d]

according to this call stack we failed on delete[] m_pUnit; call in CRcvBuffer::~CRcvBuffer() function. I wish I have any other info right now but I will add new information once available

To Reproduce very rare issue to reproduce. I'll add new information when I get it. This problem reproduced when SRT library used in single process and both senders and receivers exist in this process.

  • SRT Version / commit ID: 1.4.2

alexpokotilo avatar Feb 01 '21 06:02 alexpokotilo

Hello, we are having the same issue. The crash happens randomly, hard to reproduce.

#0 0x00007f494892d438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f494892f03a in __GI_abort () at abort.c:89 #2 0x00007f494896f7fa in __libc_message (do_abort=2, fmt=fmt@entry=0x7f4948a88f98 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007f49489767d3 in malloc_printerr (ar_ptr=0x7f4878000020, ptr=0x7f487800eb10, str=0x7f4948a85d52 "corrupted double-linked list", action=) at malloc.c:5020 #4 malloc_consolidate (av=av@entry=0x7f4878000020) at malloc.c:4182 #5 0x00007f4948978688 in _int_free (av=0x7f4878000020, p=, have_lock=0) at malloc.c:4082 #6 0x00007f494897c58c in __GI___libc_free (mem=) at malloc.c:2975 #7 0x000000000107cccd in CRcvBuffer::~CRcvBuffer (this=0x7f487800d330, __in_chrg=) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/buffer.cpp:729 #8 0x000000000108f7af in CUDT::~CUDT (this=0x2441af0, __in_chrg=) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/core.cpp:363 #9 0x00000000010775e1 in CUDTSocket::~CUDTSocket (this=0x25ed4c0, __in_chrg=) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/api.cpp:115 #10 0x0000000001078c3b in CUDTUnited::removeSocket (this=0x163ee60 CUDT::s_UDTUnited, u=) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/api.cpp:1645 #11 0x00000000010795e2 in CUDTUnited::checkBrokenSockets (this=0x163ee60 CUDT::s_UDTUnited) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/api.cpp:1590 #12 0x0000000001079710 in CUDTUnited::garbageCollect (p=0x163ee60 CUDT::s_UDTUnited) at /worker/build/863ccfbcbd79e620/root/external/srt/srtcore/api.cpp:1889 #13 0x00007f4956f7f6ba in start_thread (arg=0x7f494481b700) at pthread_create.c:333 #14 0x00007f49489ff4dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

congchenutd avatar Apr 19 '21 18:04 congchenutd

@congchenutd do you have a callstack we can work with ?

alexpokotilo avatar Apr 20 '21 02:04 alexpokotilo

What I can see weird in this call stack above is that both CUDT and CUDTSocket objects' addresses seem to be from the same domain, and the same as CUDTUnited object (which is global), while CRcvBuffer address seems to be from a completely different memory domain and its address is also very close to those things mentioned in the call stack around the system functions. Might be that being from a different domain is because CUDTBuffer is being allocated quite probably in a receiver worker thread, not in the main thread (case of listener side and non-blocking mode). But a coincidence with the address of system functions is weird.

If you at least have some scenario in which it was confirmed at least for some cases, you might try with e.g. memory sanitizer.

ethouris avatar Apr 20 '21 07:04 ethouris

Hi @alexpokotilo, @congchenutd. The old receiver buffer implementation has been replaced by the new one in v1.5.1. I am not completely sure if this issue depends on the implementation though, but I wonder if you still observe it with the latest SRT versions?

maxsharabayko avatar Dec 01 '22 14:12 maxsharabayko

Hi @maxsharabayko, Please correct me, but new CRcvBuffer was introduced in https://github.com/Haivision/srt/releases/tag/v1.5.0 See "2. New Implementation of the Receiver Buffer" in "New Features and Improvements". If so, we have limited usage experience of new CRcvBuffer as our clients can use 1.5.0 but default is still 1.4.4. But I'm sure some of them use 1.5.0 and we don't get any crash reports about aforementioned panic. This doesn't mean panic is fixed in new 1.5.0 but apposite not proved as well. Please correct me in case something changed between 1.5 and 1.5.1 as we don't use 1.5.1 in production as of now.

alexpokotilo avatar Dec 01 '22 15:12 alexpokotilo

Actually the CRcvBuffer now is the new name of the previous CRcvBufferNew that was coexisting for some time with the original CRcvBuffer. Now this is the name of the new buffer and the old CRcvBuffer implementation is completely deleted. Hard to say if this is repeatable (theoretically could be if this was coming from the code external to the buffer), although with the old buffer I was having lots of various crashes in case when I even slightly tried to modify or improve it. Might be that the old implementation contained some code doing memory override.

ethouris avatar Dec 01 '22 15:12 ethouris

The question is when new implementation replaced the old one: in 1.5.0 or in 1.5.1 ?

alexpokotilo avatar Dec 01 '22 15:12 alexpokotilo

In v1.5.0. The old was still available using the -DENABLE_NEW_RCVBUFFER=OFF build option.

maxsharabayko avatar Dec 01 '22 18:12 maxsharabayko

then we can close this ticket as we don't observe this issue with new implementation. If we find similar problem, I will file new ticket

alexpokotilo avatar Dec 01 '22 18:12 alexpokotilo