srs icon indicating copy to clipboard operation
srs copied to clipboard

FLV: Crash when switch between HTTP-FLV streams.

Open freeman1974 opened this issue 5 years ago • 10 comments

Description Multiple frequent switches to access two http-flv streams using the same player, continuously switching between these two streams. After about 5 switches, the SRS process exits. Refer to the image below to view the Linux core dump file. To investigate the corresponding code, it should be:

void srs_close_stfd(srs_netfd_t& stfd)
{
    if (stfd) {
        // we must ensure the close is ok.
        int err = st_netfd_close((st_netfd_t)stfd);
        srs_assert(err != -1);		// The assertion triggered causing the process to exit.
        stfd = NULL;
    }
}

And the caller of this func is:

void SrsTcpClient::close()
{
    // Ignore when already closed.
    if (!io) {
        return;
    }
    
    srs_close_stfd(stfd);
}

It seems that it is caused by frequent occurrences of SrsTcpClient::close(). It is caused by continuously closing and opening the socket.

    if ((*_st_eventsys->fd_close)(fd->osfd) < 0)
        return -1;

This line of code is causing the error. Is it because a global variable _st_eventsys is used without locking it?

  1. SRS version: srs 4.0.39 #define SRS_VERSION4_REVISION 39
  2. The log of SRS is as follows: Please refer to the screenshot in the attachment. http://demo.fili58.com/media/bug/photo_2020-09-07_18-16-24.jpg

TRANS_BY_GPT3

freeman1974 avatar Sep 07 '20 10:09 freeman1974

Add a sentence: If the two streams switch a little slower, there won't be this issue.

TRANS_BY_GPT3

freeman1974 avatar Sep 09 '20 12:09 freeman1974

May I ask, when you play http-flv or dash, does the server have high CPU usage? It seems that it doesn't happen with SRS3.

TRANS_BY_GPT3

RossWang avatar Sep 10 '20 05:09 RossWang

I didn't pay attention to this issue. Do you have any quantitative data? Specifically, for srs3 vs srs4.

TRANS_BY_GPT3

freeman1974 avatar Sep 10 '20 09:09 freeman1974

It seems like you don't have this problem So I checked and found that it was due to the low setting of mr_latency Thank you for your help

TRANS_BY_GPT3

RossWang avatar Sep 11 '20 02:09 RossWang

I made some modifications myself, and by limiting the streaming speed, this problem can be solved.

TRANS_BY_GPT3

freeman1974 avatar Sep 13 '20 09:09 freeman1974

How fast do you switch before encountering problems?

TRANS_BY_GPT3

winlinvip avatar Dec 01 '20 12:12 winlinvip

Within 1 second. Millisecond level.

Winlin [email protected] wrote on Tuesday, December 1, 2020 at 8:03 PM:

How fast do you switch before encountering problems?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ossrs/srs/issues/1941#issuecomment-736508934, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5PD2BD3P7VT63NUES2KBTSSTLPPANCNFSM4Q55YV4Q .

TRANS_BY_GPT3

freeman1974 avatar Dec 02 '20 00:12 freeman1974

There is currently a lingering issue with this problem, and it has been going on for many years without knowing why. It would be great if we could find the reason.

TRANS_BY_GPT3

winlinvip avatar Aug 26 '21 00:08 winlinvip

st_netfd_close is definitely closing the fd while it is being read or written by another coroutine.

So the key point is how to print out the coroutines that are accessing this fd, so that we can identify where the problem is.

Using assert is not a problem because if we don't exit at the problematic location, there will still be various issues later on, and they will be even more peculiar.

The relationship between threads and file descriptors (fd) in ST is many-to-many. A thread can read and write to multiple fds, and an fd can be read and written by multiple threads (e.g., one coroutine reading and another writing). Therefore, there is more complexity in the underlying logic. When closing an fd, it is necessary to ensure that all threads are no longer reading or writing to this fd.

TRANS_BY_GPT3

winlinvip avatar Nov 03 '21 00:11 winlinvip

Similar one, see https://github.com/ossrs/srs/issues/3784#issuecomment-2028500280

See also #511 #1784 #1829 #2419 #3784

winlinvip avatar Apr 22 '24 00:04 winlinvip