srt icon indicating copy to clipboard operation
srt copied to clipboard

GStreamer CI reporting memory leak while using srt

Open jonasdn-spiideo opened this issue 1 year ago • 7 comments

Describe the bug When running the gstreamer test-suite under valgrind a leak is reported, see here: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5440

I wonder if the GStreamer code (or the srt c api) is missing to clean something up or if this is expected?

The trace is here:

==17493== Memcheck, a memory error detector
==17493== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17493== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==17493== Command: /builds/bilboed/gstreamer/build/subprojects/gst-plugins-bad/tests/check/elements_srt
==17493== Parent PID: 14579
==17493==
==17590==
==17590== HEAP SUMMARY:
==17590==     in use at exit: 799,848 bytes in 2,182 blocks
==17590==   total heap usage: 64,090 allocs, 61,908 frees, 37,891,032 bytes allocated
==17590==
==17590== 120 bytes in 1 blocks are definitely lost in loss record 1,563 of 1,681
==17590==    at 0x4842FF5: operator new(unsigned long) (vg_replace_malloc.c:422)
==17590==    by 0x6031E29: srt::sync::SetThreadLocalError(CUDTException const&) (sync_posix.cpp:461)
==17590==    by 0x5FCD77E: CUDT::epoll_wait(int, std::set<int, std::less<int>, std::allocator<int> >*, std::set<int, std::less<int>, std::allocator<int> >*, long, std::set<int, std::less<int>, std::allocator<int> >*, std::set<int, std::less<int>, std::allocator<int> >*) [clone .cold] (api.cpp:3796)
==17590==    by 0x5FE2F79: UDT::epoll_wait2(int, int*, int*, int*, int*, long, int*, int*, int*, int*) (api.cpp:4277)
==17590==    by 0x5F0C626: gst_srt_object_read (gstsrtobject.c:1569)
==17590==    by 0x5F0F978: gst_srt_src_fill (gstsrtsrc.c:180)
==17590==    by 0x5F5A2A1: gst_base_src_default_create (gstbasesrc.c:1620)
==17590==    by 0x5F5C9AE: gst_base_src_get_range (gstbasesrc.c:2630)
==17590==    by 0x5F5EF5A: gst_base_src_loop (gstbasesrc.c:2959)
==17590==    by 0x4918B13: gst_task_func (gsttask.c:399)
==17590==    by 0x4A60B33: g_thread_pool_thread_proxy.lto_priv.0 (gthreadpool.c:354)
==17590==    by 0x4A5DC41: g_thread_proxy (gthread.c:826)
==17590==    by 0x4F532A4: start_thread (pthread_create.c:481)
==17590==    by 0x4C71322: clone (clone.S:95)

Code path in GStreamer:

 GST_TRACE_OBJECT (srtobject->element, "Waiting for read");
    ret =
        srt_epoll_wait (poll_id, &rsock, &rsocklen, &wsock, &wsocklen,
        poll_timeout, &rsys, &rsyslen, &wsys, &wsyslen);

    if (g_cancellable_is_cancelled (srtobject->cancellable))
      break;

    if (ret < 0) {
      gint srt_errno = srt_getlasterror (NULL);
      if (srt_errno == SRT_ETIMEOUT)
        continue;

      g_set_error (&internal_error, GST_RESOURCE_ERROR, GST_RESOURCE_ERROR_READ,
          "Failed to poll socket: %s", srt_getlasterror_str ());
      goto err;
    }

Desktop (please provide the following information): Linux using the C api

jonasdn-spiideo avatar Feb 09 '24 06:02 jonasdn-spiideo

A thread local object should be freed once the thread finishes its execution. Was SRT built with -DENABLE_STDCXX_SYNC=ON (C++11 thread objects) or OFF (POSIX)?

maxsharabayko avatar Feb 09 '24 09:02 maxsharabayko

A thread local object should be freed once the thread finishes its execution. Was SRT built with -DENABLE_STDCXX_SYNC=ON (C++11 thread objects) or OFF (POSIX)?

Thanks for response! I am not sure ... I am trying to find it, I cannot reproduce locally, we see it in the CI of the GStreamer project, I think it is the fedora default build (libsrt version 1.4.2 in this run).

Link to CI run here: https://gitlab.freedesktop.org/bilboed/gstreamer/-/jobs/54795504

jonasdn-spiideo avatar Feb 09 '24 10:02 jonasdn-spiideo

The fedora build spec here:

https://src.fedoraproject.org/rpms/srt/blob/f37/f/srt.spec https://src.fedoraproject.org/rpms/srt/blob/f34/f/srt.spec

I do not see -DENABLE_STDCXX_SYNC=ON what is default?

jonasdn-spiideo avatar Feb 09 '24 10:02 jonasdn-spiideo

Better to use the latest version. The default on Linux machines is OFF (POSIX).

maxsharabayko avatar Feb 09 '24 10:02 maxsharabayko

Better to use the latest version. The default on Linux machines is OFF (POSIX).

I understand that but this is a CI for the gstreamer project using the upstream version. So, in your mind is there something in the GStreamer code that could be triggering this? Something we could be omitting? Or does it feel more like some race in termination?

jonasdn-spiideo avatar Feb 09 '24 11:02 jonasdn-spiideo

Better to use the latest version. The default on Linux machines is OFF (POSIX).

I understand that but this is a CI for the gstreamer project using the upstream version. So, in your mind is there something in the GStreamer code that could be triggering this? Something we could be omitting? Or does it feel more like some race in termination?

trying to figure out if I should add a suppression for the valgrind detection or not

jonasdn-spiideo avatar Feb 09 '24 11:02 jonasdn-spiideo

I have not managed to reproduce this myself, but others reported seeing it on Debian and v1.5.2.

jonasdn-spiideo avatar Feb 13 '24 05:02 jonasdn-spiideo

Valgrind likely does not like this new statement in the POSIX version of the thread-local error: https://github.com/Haivision/srt/blob/v1.5.2/srtcore/sync_posix.cpp#L484 However, the object is to be deleted by the ThreadSpecificKeyDestroy function passed into the pthread_key_create. I would treat it as a false-positive alert.

maxsharabayko avatar May 27 '24 08:05 maxsharabayko