srt [core] Fixed: closing socket should mark and signal so that srt

[core] Fixed: closing socket should mark and signal so that srt_connect call can exit immediately

Open ethouris opened this issue 3 years ago • 6 comments

Fixes #2029

Note that some parts of it have been earlier exported to other PRs, so this one fixes only one remaining problem:

The srt_connect call makes a runaround loop with sending packets and attempting to read packets from the socket (through CRcvQueue::recvfrom) in a hope that it will connect at last. The problem is that for the whole time of running this loop it locks CSocket::m_ControlLock and CUDT::m_ConnectionLock (in this order), which holds the call to srt_close() from execution until this loop exits, and the loop exits only on timeout. The reading from the queue causes reading from socket with 1s maximum waiting time after which the conditions are being rechecked.

Fix:

Added a possibility to interrupt the call to CRcvQueue::recvfrom before the 1s timeout.
The m_bClosing AND m_bBroken flags are checked after reading by CRcvQueue::recvfrom so that the loop can interrupt immediately.
The srt_close call sets the m_bClosing flag and "kicks" the condition for CRcvQueue::recvfrom. This finally leads to immediate exit of the loop, so that srt_close() can continue.

Jun 01 '21 15:06 ethouris

Weird...

The following tests FAILED:
	 57 - TestEnforcedEncryption.PasswordLength (SEGFAULT)
	 58 - TestEnforcedEncryption.SetGetDefault (SEGFAULT)
	 87 - Transmission.FileUpload (Failed)
	143 - TestSocketOptions.MinInputBWWrongLen (SEGFAULT)
	147 - TestSocketOptions.StreamIDWrongLen (SEGFAULT)

Jun 02 '21 08:06 maxsharabayko

Not weird. The problem is bigger than I thought. I found a solution, but it's a dirty workaround, you might not like it.

Jun 02 '21 09:06 ethouris

Indeed not sure about this fix.

It looks like if srt_close(..) is called between the start of the CUDT::startConnect(..) and the line with m_bConnecting = true;, then m_bClosing is not set to true, and the hang-up for the SRTO_CONNECTTIMEO would still happen. Although the chances are reduced.

Furthermore, the fix might make even more fragile:

The problem is that this flag shall NOT be set in case when you have a CONNECTED socket because not only isn't it not a problem in this case, but also it additionally turns the socket in a "confused" state in which it skips vital part of closing itself and therefore runs an infinite loop when trying to purge the sender buffer of the closing socket.

At the same time, I don't see an easy way to fix issue #2029 without intense refactoring. Unlocking m_ControlLock before s->core().startConnect(target_addr, forced_isn) seems to violate the locking order. 🤔

Aug 16 '21 15:08 maxsharabayko

srt srt copied to clipboard

[core] Fixed: closing socket should mark and signal so that srt_connect call can exit immediately

srt
srt copied to clipboard