tmpnb icon indicating copy to clipboard operation
tmpnb copied to clipboard

Major file descriptor issues with websockets

Open rgbkrk opened this issue 11 years ago • 5 comments

Our cleanup (using docker stop instead of docker kill) helps clean up file descriptors during culling but we still have a major problem: all the file descriptors that get opened by individual notebook servers and the configurable http proxy.

I haven't tracked down exact pieces, but here's some summarized stats:

root@demo:~# lsof 2> /dev/null | grep python | cut -c 88- | sort | uniq -c | sort -nr | head
  25861 anon_inode
  13570 can't identify protocol
   7814 pipe
   3067 /dev/urandom
   2682 /home/jupyter/.ipython/profile_default/history.sqlite
   1833 /dev/null
   1637 /
   1587 /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0 (stat: No such file or directory)
   1587 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 (path dev=8,1, inode=7783)
   1587 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6 (path dev=8,1, inode=7737)

rgbkrk avatar Oct 15 '14 19:10 rgbkrk

Possibly relevant, though closed and old: https://github.com/nodejitsu/node-http-proxy/issues/570

rgbkrk avatar Oct 16 '14 04:10 rgbkrk

We should investigate whether node/the node http proxy is using select instead of poll.

rgbkrk avatar Oct 16 '14 04:10 rgbkrk

Sockets are ballooning within the node proxy when handling the websockets. A port gets allocated for each one between node and Docker.

Every 2.0s: sudo lsof -i | grep node                                                                                                                      Thu Oct 23 22:58:28 2014

sudo: unable to resolve host dev
node    3924 nobody   10u  IPv4 117123      0t0  TCP *:8000 (LISTEN)
node    3924 nobody   11u  IPv4 117124      0t0  TCP ip6-localhost:8001 (LISTEN)
node    3924 nobody   12u  IPv4 117125      0t0  TCP ip6-localhost:8001->ip6-localhost:59783 (ESTABLISHED)
node    3924 nobody   13u  IPv4 185713      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56590 (ESTABLISHED)
node    3924 nobody   14u  IPv4 184626      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56577 (ESTABLISHED)
node    3924 nobody   15u  IPv4 184628      0t0  TCP ip6-localhost:40600->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   16u  IPv4 182588      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56578 (ESTABLISHED)
node    3924 nobody   17u  IPv4 182590      0t0  TCP ip6-localhost:40607->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   18u  IPv4 186387      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56579 (ESTABLISHED)
node    3924 nobody   19u  IPv4 186389      0t0  TCP ip6-localhost:40611->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   20u  IPv4 185715      0t0  TCP ip6-localhost:40636->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   21u  IPv4 175024      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56591 (ESTABLISHED)
node    3924 nobody   22u  IPv4 175026      0t0  TCP ip6-localhost:40642->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   23u  IPv4 151837      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56592 (ESTABLISHED)
node    3924 nobody   24u  IPv4 151839      0t0  TCP ip6-localhost:40648->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   25u  IPv4 184792      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56602 (ESTABLISHED)
node    3924 nobody   26u  IPv4 184794      0t0  TCP ip6-localhost:40672->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   27u  IPv4 173469      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56603 (ESTABLISHED)
node    3924 nobody   28u  IPv4 173471      0t0  TCP ip6-localhost:40678->ip6-localhost:49155 (ESTABLISHED)
node    3924 nobody   29u  IPv4 184813      0t0  TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56604 (ESTABLISHED)
node    3924 nobody   30u  IPv4 184815      0t0  TCP ip6-localhost:40683->ip6-localhost:49155 (ESTABLISHED)

rgbkrk avatar Oct 23 '14 22:10 rgbkrk

To summarize the root issue here, it's that the proxy is having to open more than the expected one to communicate to the underlying docker container. It makes me wish we could hand the socket over directly.

rgbkrk avatar Dec 08 '14 22:12 rgbkrk

We should think about and research (for the configurable-http-proxy possibly) learning lessons from this cloudflare post about websockets. They do things like binding before connecting and deciding when they can safely reuse ports. There's a more in depth technical article from Marek on Bind before connect as well.

rgbkrk avatar Aug 26 '15 21:08 rgbkrk