tmpnb
tmpnb copied to clipboard
Major file descriptor issues with websockets
Our cleanup (using docker stop instead of docker kill) helps clean up file descriptors during culling but we still have a major problem: all the file descriptors that get opened by individual notebook servers and the configurable http proxy.
I haven't tracked down exact pieces, but here's some summarized stats:
root@demo:~# lsof 2> /dev/null | grep python | cut -c 88- | sort | uniq -c | sort -nr | head
25861 anon_inode
13570 can't identify protocol
7814 pipe
3067 /dev/urandom
2682 /home/jupyter/.ipython/profile_default/history.sqlite
1833 /dev/null
1637 /
1587 /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0 (stat: No such file or directory)
1587 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 (path dev=8,1, inode=7783)
1587 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6 (path dev=8,1, inode=7737)
Possibly relevant, though closed and old: https://github.com/nodejitsu/node-http-proxy/issues/570
We should investigate whether node/the node http proxy is using select instead of poll.
Sockets are ballooning within the node proxy when handling the websockets. A port gets allocated for each one between node and Docker.
Every 2.0s: sudo lsof -i | grep node Thu Oct 23 22:58:28 2014
sudo: unable to resolve host dev
node 3924 nobody 10u IPv4 117123 0t0 TCP *:8000 (LISTEN)
node 3924 nobody 11u IPv4 117124 0t0 TCP ip6-localhost:8001 (LISTEN)
node 3924 nobody 12u IPv4 117125 0t0 TCP ip6-localhost:8001->ip6-localhost:59783 (ESTABLISHED)
node 3924 nobody 13u IPv4 185713 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56590 (ESTABLISHED)
node 3924 nobody 14u IPv4 184626 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56577 (ESTABLISHED)
node 3924 nobody 15u IPv4 184628 0t0 TCP ip6-localhost:40600->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 16u IPv4 182588 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56578 (ESTABLISHED)
node 3924 nobody 17u IPv4 182590 0t0 TCP ip6-localhost:40607->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 18u IPv4 186387 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56579 (ESTABLISHED)
node 3924 nobody 19u IPv4 186389 0t0 TCP ip6-localhost:40611->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 20u IPv4 185715 0t0 TCP ip6-localhost:40636->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 21u IPv4 175024 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56591 (ESTABLISHED)
node 3924 nobody 22u IPv4 175026 0t0 TCP ip6-localhost:40642->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 23u IPv4 151837 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56592 (ESTABLISHED)
node 3924 nobody 24u IPv4 151839 0t0 TCP ip6-localhost:40648->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 25u IPv4 184792 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56602 (ESTABLISHED)
node 3924 nobody 26u IPv4 184794 0t0 TCP ip6-localhost:40672->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 27u IPv4 173469 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56603 (ESTABLISHED)
node 3924 nobody 28u IPv4 173471 0t0 TCP ip6-localhost:40678->ip6-localhost:49155 (ESTABLISHED)
node 3924 nobody 29u IPv4 184813 0t0 TCP 23.253.157.134:8000->83-244-151-247.cust-83.exponential-e.net:56604 (ESTABLISHED)
node 3924 nobody 30u IPv4 184815 0t0 TCP ip6-localhost:40683->ip6-localhost:49155 (ESTABLISHED)
To summarize the root issue here, it's that the proxy is having to open more than the expected one to communicate to the underlying docker container. It makes me wish we could hand the socket over directly.
We should think about and research (for the configurable-http-proxy possibly) learning lessons from this cloudflare post about websockets. They do things like binding before connecting and deciding when they can safely reuse ports. There's a more in depth technical article from Marek on Bind before connect as well.