signalbox
signalbox copied to clipboard
CLOSE_WAIT connection status after disconnect clients
Many connection still open with CLOSE_WAIT status after client disconnect
[root@SUPV6743 ~]# netstat -nap | grep :3000 | awk '{print $6}' | sort -n | uniq -c 14698 CLOSE_WAIT 73 ESTABLISHED 1 LISTEN
Thanks!
Had a dig around in this one today. Found that primus ping messages were not being correctly returned, and websockets not correctly closed by the server. These two things would have quickly blasted out many open connections.
This may also improve performance somewhat (see #4).
Hi!
I still saw the issue:
2567 CLOSE_WAIT 150 ESTABLISHED 1 LISTEN
BTW; I has stats about GO perf, numbers of GOROUTINES and more... I has captured using https://github.com/yvasiyarov/gorelic
I see that GC take more time every minutes that application is running, the same in numbers of Go Routines and others.
I can share with you.
Let me know... thanks in advance!
Hello, i tested and i see that the websocket reconnect every 1.2 min only see one annunce and one ping message and close
yes I see the same issue... and if you check into webrtc of chrome see many connections chrome://webrtc-internals/
Now almost I see this message, that be new: 2015/07/07 04:17:19 ERROR - http.HandleFunc: websocket: version != 13
If I open the conections now to http://xx.xx.37.22:3000/rtc.io/primus.js
The response is perfect but If I try to open Websocket fail.. in base to the error into the Gorilla lib, has some issues with the websocket upgrade
https://github.com/gorilla/websocket/blob/master/server.go#L97
hope this help in something
Some stats from GO process... take a look into File Descriptors used after some time...
Are you able to send me what the agent string is in the /announce message? It is just truncated in @gonzafirewall screenshot. I just want to make sure I am testing with the same version as you.
In terms of the Gorilla lib it looks like your websocket client is not sending Sec-WebSocket-Version: 13 Are you able to send me a dump of the full client request? What is the client that is trying to connect? The browser versions that implement Websocket V13 is highlighted below:
I.e. Gorilla lib (and Signalbox) currently only support Websocket version 13.
Hi, here the logs from my Desktop.
/announce|{"id":"cde7244f-cdb4-4c8e-a36d-00403cb18c18"}|{"browser":"chrome","browserVersion":"43.0.2357","id":"cde7244f-cdb4-4c8e-a36d-00403cb18c18","agent":"[email protected]","room":"aHR0cDovL2hscy5jYXN0dG8ubWUvbGl2ZS9oVkZwVTlDZnM0VmNBNDRHb0xHRi9wbGF5bGlzdC5tM3U4"}
I think the issue with WebSocket version, is why when we use into production with real users, some can be using older version of the browsers and IE, I think we can dismiss this message.
The main issue to check now, is the perf, the amount of FD opened... and the GC time, that always increase... after some minutes stop to response correctly the websocket part, something extrange is the amount of time that the same ID be into the announce. But at the moment of the capture, was only me using 2 tabs to simulate the p2p connection.

Something more info that I can provide to help? just let me know?
THhnks in advance!
It looks like the application built upon rtc.io is sending additional messages on top of announce. In that snippet you sent. There is one announce and several custom /to messages -- I imagine something is trying several paths to broker a connection?
In terms of FD - from what I can tell one FD is opened per websocket. I also create one goroutine per websocket. So that would go some ways to explain the growth in FD's and Goroutines. Does the application ever send any leave commands? Is there any reason why the application would open a websocket and never close it once webrtc peers are established?
P.S. the maximum number of file descriptors that can be opened in linux can be tuned somewhat -- http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/