smux icon indicating copy to clipboard operation
smux copied to clipboard

Serious keepalive issue

Open freinholz opened this issue 5 years ago • 6 comments

I've migrated my project from Yamux to Smux a couple of weeks ago. Most reasionly due to the memory alloc behavior of Yamux After facing some issue, I thought everything is smooth and stable. But the included keepalive functions seems to be far from that.

A debugging revealed that the notifyBucket/check dataReady approch is not reliable under heavy load and with higher RTT. The async notification works, but dataReady is sometimes not correctly set for the timeout checks. It is also very important to mention that this only happens when testing under real life conditions. My local test-cases are all fine, but testing between remote devices reveals the mentioned issue.

One mitigation would be removing the s.Close() right after the dataReady checks. Another would be replacing the check by a more reliable approach. There would also be the option to disable keepalives and implement a real Ping() function.

I've chosen the last option.

Could you please check if a patch integration makes sense or alternatively provide a fix for the keepalive.

Thanks!

Here is my ping patch: ping_patch.txt

freinholz avatar Feb 20 '20 08:02 freinholz

394 if !atomic.CompareAndSwapInt32(&s.dataReady, 1, 0) {

so if this line fails, there is no incoming data, OR(for extreme case), the goroutine for recvLoop() hasn't been scheduled to ?

xtaci avatar Feb 20 '20 09:02 xtaci

for one special case:

the caller of smux don't read the incoming data, then , it blocks on : 309 for atomic.LoadInt32(&s.bucket) <= 0 && !s.IsClosed() {

xtaci avatar Feb 20 '20 09:02 xtaci

https://github.com/xtaci/smux/commit/80072d9b5c44fe531e0e5d7d36b029b87f20065c

xtaci avatar Feb 20 '20 09:02 xtaci

Your patch loooks like a good option for approach 2!

Any comments on the Ping()?

freinholz avatar Feb 20 '20 10:02 freinholz

Pong is not required, IMHO, 'cause we don't need RTT.

xtaci avatar Feb 20 '20 11:02 xtaci

You may don't need it, but I'm using it and there have been plenty of other requests. In addition, it's nice to have a connection control apart from the non-reliable IsClosed() to managed adjacent processes. But okay. So I'll have to managed my own branch.

freinholz avatar Feb 20 '20 12:02 freinholz