srt
srt copied to clipboard
[core] Fixed: closing socket should mark and signal so that srt_connect call can exit immediately
Fixes #2029
Note that some parts of it have been earlier exported to other PRs, so this one fixes only one remaining problem:
The srt_connect
call makes a runaround loop with sending packets and attempting to read packets from the socket (through CRcvQueue::recvfrom
) in a hope that it will connect at last. The problem is that for the whole time of running this loop it locks CSocket::m_ControlLock
and CUDT::m_ConnectionLock
(in this order), which holds the call to srt_close()
from execution until this loop exits, and the loop exits only on timeout. The reading from the queue causes reading from socket with 1s maximum waiting time after which the conditions are being rechecked.
Fix:
- Added a possibility to interrupt the call to
CRcvQueue::recvfrom
before the 1s timeout. - The
m_bClosing
ANDm_bBroken
flags are checked after reading byCRcvQueue::recvfrom
so that the loop can interrupt immediately. - The
srt_close
call sets them_bClosing
flag and "kicks" the condition forCRcvQueue::recvfrom
. This finally leads to immediate exit of the loop, so thatsrt_close()
can continue.
Weird...
The following tests FAILED:
57 - TestEnforcedEncryption.PasswordLength (SEGFAULT)
58 - TestEnforcedEncryption.SetGetDefault (SEGFAULT)
87 - Transmission.FileUpload (Failed)
143 - TestSocketOptions.MinInputBWWrongLen (SEGFAULT)
147 - TestSocketOptions.StreamIDWrongLen (SEGFAULT)
Not weird. The problem is bigger than I thought. I found a solution, but it's a dirty workaround, you might not like it.
Indeed not sure about this fix.
It looks like if srt_close(..)
is called between the start of the CUDT::startConnect(..)
and the line with m_bConnecting = true;
, then m_bClosing
is not set to true
, and the hang-up for the SRTO_CONNECTTIMEO
would still happen. Although the chances are reduced.
Furthermore, the fix might make even more fragile:
The problem is that this flag shall NOT be set in case when you have a CONNECTED socket because not only isn't it not a problem in this case, but also it additionally turns the socket in a "confused" state in which it skips vital part of closing itself and therefore runs an infinite loop when trying to purge the sender buffer of the closing socket.
At the same time, I don't see an easy way to fix issue #2029 without intense refactoring. Unlocking m_ControlLock
before s->core().startConnect(target_addr, forced_isn)
seems to violate the locking order. 🤔