srt icon indicating copy to clipboard operation
srt copied to clipboard

When there is no network,srt_connect() has no return

Open BoleLiu opened this issue 3 years ago • 21 comments

On Android platform, blocking mode,if I turn off the metwork, there is no return for srt_connect and srt_send, does this meet expectations? or any configuration I forgot to set?

BoleLiu avatar Apr 16 '21 04:04 BoleLiu

What do you mean by "there is no return"? The blocking functions don't exit? The expected is that if the connection function doesn't establish a connection in a predicted time, it exits with failure. This time is defined by SRTO_CONNTIMEO.

ethouris avatar Apr 16 '21 07:04 ethouris

"there is no return" means it's no return and is blocking there

There are 2 problems:

  1. turn off the network during the transmission, srt_send has no return and blocking. But I fixed it by getting the socket state before sending, and if socket state is broken, return immediately.
  2. turn off the network before srt_connect, and when srt_connect is called, there is no return and blocking, too.

Any advice for the second problem?Thanks a lot

BoleLiu avatar Apr 16 '21 08:04 BoleLiu

I got the log as following, but the srt_connect method still has no return in blocking mode: D:SRT.cn: startConnect: TTL time 18733D 07:21:08.774405 [STDY] exceeded, TIMEOUT.

BoleLiu avatar Apr 16 '21 08:04 BoleLiu

What source version do you have (version in git)? I'm suspecting this might be one of the old deadlock problems around registerConnector.

If you enable heavy logs at compile time (ENABLE_HEAVY_LOGGING in cmake) and enable them in the application (-loglevel debug) you should see this log:

    HLOGC(cnlog.Debug, log << "removeConnector: removing @" << id);

If the application hangs after displaying this, it could be this deadlock.

If you could help me by running this under a debugger and see where particular threads are hanging, it would be even more helpful.

ethouris avatar Apr 16 '21 08:04 ethouris

image it does running to the log above, and then it stucked there

BoleLiu avatar Apr 16 '21 08:04 BoleLiu

Ah, so that's what I suspected. First, however, I need to know your version.

ethouris avatar Apr 16 '21 08:04 ethouris

How can I get the real version? The version in CMakeLists is 1.4.3, but in README.md, it's 1.4.2. Besides, I compiled the library by the newest master code in the repo

BoleLiu avatar Apr 16 '21 08:04 BoleLiu

Ok, there's a PR that is intended to fix things around there. Would you be able to take the code from the branch mentioned there and see if this fixes the problem? If you confirm it, we should be able to increase the priority for it.

https://github.com/Haivision/srt/pull/1844

ethouris avatar Apr 16 '21 08:04 ethouris

OK, I'll try it later, and which version is more stable for live stream transmiting?

BoleLiu avatar Apr 16 '21 08:04 BoleLiu

For all I know, the latest master should be stable enough. Maybe @maxsharabayko can be more precise.

ethouris avatar Apr 16 '21 08:04 ethouris

ok, and for the first problem, do you have any advice? Have you encountered this problem before?

BoleLiu avatar Apr 16 '21 09:04 BoleLiu

That problem I haven't found, but we've encountered a suspected potential deadlock around this place with thread sanitizer, that's why believe the PR I gave you may fix the problem.

ethouris avatar Apr 16 '21 09:04 ethouris

I pulled the PR to my local branch and recompiled the library, but it seems doesn't work, it still has no return

BoleLiu avatar Apr 16 '21 09:04 BoleLiu

Would you be able to run it under a debugger? Unfortunately I don't have an Android platform at hand to test it...

Also, do you use your own application or one of those in SRT repo?

ethouris avatar Apr 16 '21 09:04 ethouris

I can not run it under a debugger, but I can get the debug log, and I use my own application to test it

BoleLiu avatar Apr 16 '21 09:04 BoleLiu

From your discussion, the only thing it can be hanging on is the CRendezvousQueue:::m_RIDVectorLock in CRcvQueue::removeConnector(..) , with the lock probably taken by CRendezvousQueue::updateConnStatus(..). The latest screenshot seems to confirm this.

UDP: Although the last message from there is "updateConnStatus: 0/1 sockets updated...", so the lock must be released.

maxsharabayko avatar Apr 16 '21 10:04 maxsharabayko

It would be very surprising if it is hanging here, but just to check @BoleLiu could you please add some logs around THREAD_PAUSED() and THREAD_RESUMED()?

int CRcvQueue::recvfrom(int32_t id, CPacket& w_packet)
{
    UniqueLock bufferlock (m_BufferLock);
    CSync buffercond    (m_BufferCond, bufferlock);

    map<int32_t, std::queue<CPacket *> >::iterator i = m_mBuffer.find(id);

    if (i == m_mBuffer.end())
    {
        THREAD_PAUSED();
        buffercond.wait_for(seconds_from(1));
        THREAD_RESUMED();

maxsharabayko avatar Apr 16 '21 10:04 maxsharabayko

image @maxsharabayko it isn't hanging here, it looks resumed success

BoleLiu avatar Apr 16 '21 10:04 BoleLiu

I see... 🤔 Could you please add more logs around CRcvQueue::m_BufferLock then? To track where it is hanging locked if it is the cause of the dead lock.

maxsharabayko avatar Apr 16 '21 11:04 maxsharabayko

@maxsharabayko I added more logs and found that it didn't block in remoteConnector, It seems like broken socket can not be removed in checkBrokenSockets, and then can not run out of the loop in garbageCollect. Besides, I want to know, what is the expected phenomenon when I call srt_connect under a broken network in blocking mode?

BoleLiu avatar Apr 17 '21 00:04 BoleLiu

In blocking mode, the connecting function (CUDT::startConnect) runs a loop of sending and receiving packets necessary for the handshake. In case of cut off network it simply won't receive anything in response and should give up and exit with failure (throw an exception) after a timeout. This "registerred connector" is required for the facility to know that a socket is connection-pending so that it knows where to dispatch handshakes. When a removal is happening, it means it has given up and is about to return an error.

ethouris avatar Apr 19 '21 07:04 ethouris