kcp-go icon indicating copy to clipboard operation
kcp-go copied to clipboard

UDPSession no longer flushes output queue on Close

Open cohosh opened this issue 1 year ago • 3 comments

Ever since https://github.com/xtaci/kcp-go/commit/651dc4a6a4ff3226775d4df48632498f2125d31f, a UDPSession will not flush queued outgoing packets when the connection is closed. Instead, there is a race condition in the select call of postProcess when the s.die channel is closed in Close:

		select {
		case buf := <-s.chPostProcessing: // dequeue from post processing

[snip]
			// notify chCork only when chPostProcessing is empty
			if len(s.chPostProcessing) == 0 {
				select {
				case chCork <- struct{}{}:
				default:
				}
			}
		case <-chCork: // emulate a corked socket
			if len(txqueue) > 0 {
				s.tx(txqueue)
				// recycle
				for k := range txqueue {
					xmitBuf.Put(txqueue[k].Buffers[0])
					txqueue[k].Buffers = nil
				}
				txqueue = txqueue[:0]
			}

		case <-s.die:
			return
		}

if Close is called immediately after a Write, the queued packets will not be sent. This wasn't an issue before because the uncork function was called from within Close.

I noticed this because one of our unit tests started failing when we tried to update the version of this library.

cohosh avatar Aug 27 '24 21:08 cohosh

https://github.com/xtaci/kcp-go/compare/master...hotfix273 HI @cohosh, can you try this commit?

xtaci avatar Aug 28 '24 03:08 xtaci

Thanks, that got rid of some of the race conditions but it looks like there are some remaining and our tests are still failing.

There's one in the output callback:

// delivery to post processing
select {
case sess.chPostProcessing <- bts:
case <-sess.die:
return
}

We could probably just remove the select statement here and always send the bytes to sess.chPostProcessing.

I just tried this locally but even with this fix there's still another problem where by the time the output callback sends to sess.chPostProcessing, the select statement in postProcess has already died because at the time, there wasn't anything in that channel.

cohosh avatar Aug 28 '24 13:08 cohosh

Thanks, that got rid of some of the race conditions but it looks like there are some remaining and our tests are still failing.

There's one in the output callback:

// delivery to post processing
select {
case sess.chPostProcessing <- bts:
case <-sess.die:
return
}

We could probably just remove the select statement here and always send the bytes to sess.chPostProcessing.

No,we can't do that, if chPostProcessing is full(because postProcess exits). It blocks forever on waiting for sending.

I just tried this locally but even with this fix there's still another problem where by the time the output callback sends to sess.chPostProcessing, the select statement in postProcess has already died because at the time, there wasn't anything in that channel.

Yup, The thing is a session cannot wait forever. So If 'Close()' is called, the session makes the 'best-effort delivery'.

xtaci avatar Aug 28 '24 13:08 xtaci