srt
srt copied to clipboard
When there is no network,srt_connect() has no return
On Android platform, blocking mode,if I turn off the metwork, there is no return for srt_connect and srt_send, does this meet expectations? or any configuration I forgot to set?
What do you mean by "there is no return"? The blocking functions don't exit? The expected is that if the connection function doesn't establish a connection in a predicted time, it exits with failure. This time is defined by SRTO_CONNTIMEO
.
"there is no return" means it's no return and is blocking there
There are 2 problems:
- turn off the network during the transmission, srt_send has no return and blocking. But I fixed it by getting the socket state before sending, and if socket state is broken, return immediately.
- turn off the network before srt_connect, and when srt_connect is called, there is no return and blocking, too.
Any advice for the second problem?Thanks a lot
I got the log as following, but the srt_connect method still has no return in blocking mode:
D:SRT.cn: startConnect: TTL time 18733D 07:21:08.774405 [STDY] exceeded, TIMEOUT.
What source version do you have (version in git)? I'm suspecting this might be one of the old deadlock problems around registerConnector
.
If you enable heavy logs at compile time (ENABLE_HEAVY_LOGGING
in cmake) and enable them in the application (-loglevel debug
) you should see this log:
HLOGC(cnlog.Debug, log << "removeConnector: removing @" << id);
If the application hangs after displaying this, it could be this deadlock.
If you could help me by running this under a debugger and see where particular threads are hanging, it would be even more helpful.
it does running to the log above, and then it stucked there
Ah, so that's what I suspected. First, however, I need to know your version.
How can I get the real version? The version in CMakeLists is 1.4.3, but in README.md, it's 1.4.2. Besides, I compiled the library by the newest master code in the repo
Ok, there's a PR that is intended to fix things around there. Would you be able to take the code from the branch mentioned there and see if this fixes the problem? If you confirm it, we should be able to increase the priority for it.
https://github.com/Haivision/srt/pull/1844
OK, I'll try it later, and which version is more stable for live stream transmiting?
For all I know, the latest master should be stable enough. Maybe @maxsharabayko can be more precise.
ok, and for the first problem, do you have any advice? Have you encountered this problem before?
That problem I haven't found, but we've encountered a suspected potential deadlock around this place with thread sanitizer, that's why believe the PR I gave you may fix the problem.
I pulled the PR to my local branch and recompiled the library, but it seems doesn't work, it still has no return
Would you be able to run it under a debugger? Unfortunately I don't have an Android platform at hand to test it...
Also, do you use your own application or one of those in SRT repo?
I can not run it under a debugger, but I can get the debug log, and I use my own application to test it
From your discussion, the only thing it can be hanging on is the CRendezvousQueue:::m_RIDVectorLock
in CRcvQueue::removeConnector(..)
, with the lock probably taken by CRendezvousQueue::updateConnStatus(..)
.
The latest screenshot seems to confirm this.
UDP: Although the last message from there is "updateConnStatus: 0/1 sockets updated...", so the lock must be released.
It would be very surprising if it is hanging here, but just to check @BoleLiu could you please add some logs around THREAD_PAUSED()
and THREAD_RESUMED()
?
int CRcvQueue::recvfrom(int32_t id, CPacket& w_packet)
{
UniqueLock bufferlock (m_BufferLock);
CSync buffercond (m_BufferCond, bufferlock);
map<int32_t, std::queue<CPacket *> >::iterator i = m_mBuffer.find(id);
if (i == m_mBuffer.end())
{
THREAD_PAUSED();
buffercond.wait_for(seconds_from(1));
THREAD_RESUMED();
@maxsharabayko it isn't hanging here, it looks resumed success
I see... 🤔
Could you please add more logs around CRcvQueue::m_BufferLock
then? To track where it is hanging locked if it is the cause of the dead lock.
@maxsharabayko I added more logs and found that it didn't block in remoteConnector, It seems like broken socket can not be removed in checkBrokenSockets, and then can not run out of the loop in garbageCollect. Besides, I want to know, what is the expected phenomenon when I call srt_connect under a broken network in blocking mode?
In blocking mode, the connecting function (CUDT::startConnect
) runs a loop of sending and receiving packets necessary for the handshake. In case of cut off network it simply won't receive anything in response and should give up and exit with failure (throw an exception) after a timeout. This "registerred connector" is required for the facility to know that a socket is connection-pending so that it knows where to dispatch handshakes. When a removal is happening, it means it has given up and is about to return an error.