signalbox icon indicating copy to clipboard operation
signalbox copied to clipboard

CLOSE_WAIT connection status after disconnect clients

Open cdgraff opened this issue 9 years ago • 9 comments

Many connection still open with CLOSE_WAIT status after client disconnect

[root@SUPV6743 ~]# netstat -nap | grep :3000 | awk '{print $6}' | sort -n | uniq -c 14698 CLOSE_WAIT 73 ESTABLISHED 1 LISTEN

Thanks!

cdgraff avatar Jun 18 '15 00:06 cdgraff

Had a dig around in this one today. Found that primus ping messages were not being correctly returned, and websockets not correctly closed by the server. These two things would have quickly blasted out many open connections.

This may also improve performance somewhat (see #4).

cfreeman avatar Jul 02 '15 11:07 cfreeman

Hi!

I still saw the issue:

2567 CLOSE_WAIT 150 ESTABLISHED 1 LISTEN

BTW; I has stats about GO perf, numbers of GOROUTINES and more... I has captured using https://github.com/yvasiyarov/gorelic

I see that GC take more time every minutes that application is running, the same in numbers of Go Routines and others.

I can share with you.

Let me know... thanks in advance!

cdgraff avatar Jul 07 '15 00:07 cdgraff

Hello, i tested and i see that the websocket reconnect every 1.2 min only see one annunce and one ping message and close websocket

gonzafirewall avatar Jul 07 '15 02:07 gonzafirewall

yes I see the same issue... and if you check into webrtc of chrome see many connections chrome://webrtc-internals/

Now almost I see this message, that be new: 2015/07/07 04:17:19 ERROR - http.HandleFunc: websocket: version != 13

If I open the conections now to http://xx.xx.37.22:3000/rtc.io/primus.js

The response is perfect but If I try to open Websocket fail.. in base to the error into the Gorilla lib, has some issues with the websocket upgrade

https://github.com/gorilla/websocket/blob/master/server.go#L97

hope this help in something

cdgraff avatar Jul 07 '15 03:07 cdgraff

Some stats from GO process... take a look into File Descriptors used after some time...

fd-go gc-signalbox

cdgraff avatar Jul 07 '15 03:07 cdgraff

Are you able to send me what the agent string is in the /announce message? It is just truncated in @gonzafirewall screenshot. I just want to make sure I am testing with the same version as you.

In terms of the Gorilla lib it looks like your websocket client is not sending Sec-WebSocket-Version: 13 Are you able to send me a dump of the full client request? What is the client that is trying to connect? The browser versions that implement Websocket V13 is highlighted below:

websockets

I.e. Gorilla lib (and Signalbox) currently only support Websocket version 13.

cfreeman avatar Jul 09 '15 09:07 cfreeman

Hi, here the logs from my Desktop.

/announce|{"id":"cde7244f-cdb4-4c8e-a36d-00403cb18c18"}|{"browser":"chrome","browserVersion":"43.0.2357","id":"cde7244f-cdb4-4c8e-a36d-00403cb18c18","agent":"[email protected]","room":"aHR0cDovL2hscy5jYXN0dG8ubWUvbGl2ZS9oVkZwVTlDZnM0VmNBNDRHb0xHRi9wbGF5bGlzdC5tM3U4"}

pb-7cantqxpl6ymqqlbqfugiqcbdokpjxragfwrj4dk

I think the issue with WebSocket version, is why when we use into production with real users, some can be using older version of the browsers and IE, I think we can dismiss this message.

The main issue to check now, is the perf, the amount of FD opened... and the GC time, that always increase... after some minutes stop to response correctly the websocket part, something extrange is the amount of time that the same ID be into the announce. But at the moment of the capture, was only me using 2 tabs to simulate the p2p connection.

captura de pantalla 2015-07-09 20 33 12

Something more info that I can provide to help? just let me know?

THhnks in advance!

cdgraff avatar Jul 09 '15 23:07 cdgraff

It looks like the application built upon rtc.io is sending additional messages on top of announce. In that snippet you sent. There is one announce and several custom /to messages -- I imagine something is trying several paths to broker a connection?

In terms of FD - from what I can tell one FD is opened per websocket. I also create one goroutine per websocket. So that would go some ways to explain the growth in FD's and Goroutines. Does the application ever send any leave commands? Is there any reason why the application would open a websocket and never close it once webrtc peers are established?

cfreeman avatar Jul 20 '15 04:07 cfreeman

P.S. the maximum number of file descriptors that can be opened in linux can be tuned somewhat -- http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

cfreeman avatar Jul 20 '15 04:07 cfreeman