etcd-mesos icon indicating copy to clipboard operation
etcd-mesos copied to clipboard

etcd-mesos scheduler doesnt clean up healthcheck tcp socket

Open wallnerryan opened this issue 8 years ago • 1 comments

etcd version 2.2.3, mesosphere/etcd-mesos:0.1.3

etcd-mesos container leaves open sockets (we saw over 42,000k) and this eats up ports and bleeds into other port ranges eventually killing Spartan and therefore the worker nodes health.

1026 / 1027 is a ports running etcd

{
"name": "_etcd-server._client.etcd-ptx.mesos.",
"host": "etcd-server-8hw4j-s4.etcd-ptx.mesos.:1026",
"rtype": "SRV"
}

{
"name": "_etcd-server._client.etcd-ptx.mesos.",
"host": "etcd-server-yemwe-s0.etcd-ptx.mesos.:1027",
"rtype": "SRV"
}

{
"name": "_etcd-server._client.etcd-ptx.mesos.",
"host": "etcd-server-6rfx8-s6.etcd-ptx.mesos.:1026",
"rtype": "SRV"
}
$ sudo lsof -i -P -n  | grep -oc :1026
10527

$ sudo lsof -i -P -n  | grep -oc :1027
3905

You can see 10k open, we saw spartan killed at 40K

$  sudo lsof -i -P -n  | grep 1026
px        131653         root    5u  IPv4 73405621      0t0  TCP 10.251.206.22:36338->10.251.206.17:1026 (ESTABLISHED)
px        131653         root    7u  IPv4 73394523      0t0  TCP 10.251.206.22:36340->10.251.206.17:1026 (ESTABLISHED)
px        131653         root   24u  IPv4 74751557      0t0  TCP 10.251.206.22:37012->10.251.206.17:1026 (ESTABLISHED)
px        131653         root   26u  IPv4 73414087      0t0  TCP 10.251.206.22:36470->10.251.206.17:1026 (ESTABLISHED)
px        131653         root   30u  IPv4 74757811      0t0  TCP 10.251.206.22:37640->10.251.206.17:1026 (ESTABLISHED)
px        131653         root   31u  IPv4 74740409      0t0  TCP 10.251.206.22:36816->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   13u  IPv4 74566478      0t0  TCP 10.251.206.22:37786->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   15u  IPv4 74693374      0t0  TCP 10.251.206.22:37873->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   17u  IPv4 74693376      0t0  TCP 10.251.206.22:38010->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   18u  IPv4 74566510      0t0  TCP 10.251.206.22:58952->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   20u  IPv4 74783939      0t0  TCP 10.251.206.22:59134->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   21u  IPv4 74786210      0t0  TCP 10.251.206.22:59136->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   22u  IPv4 74768044      0t0  TCP 10.251.206.22:38198->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   24u  IPv4 74759147      0t0  TCP 10.251.206.22:38514->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   27u  IPv4 74755262      0t0  TCP 10.251.206.22:59608->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   30u  IPv4 74793585      0t0  TCP 10.251.206.22:38952->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   31u  IPv4 74800159      0t0  TCP 10.251.206.22:39084->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   32u  IPv4 74800160      0t0  TCP 10.251.206.22:60026->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   33u  IPv4 74800195      0t0  TCP 10.251.206.22:39206->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   34u  IPv4 74800196      0t0  TCP 10.251.206.22:60150->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   35u  IPv4 74800228      0t0  TCP 10.251.206.22:39346->10.251.206.17:1026 (ESTABLISHED)
etcd-meso 168715         root   36u  IPv4 74800229      0t0  TCP 10.251.206.22:60288->10.251.206.13:1026 (ESTABLISHED)
etcd-meso 168715         root   37u  IPv4 74801560      0t0  TCP 10.251.206.22:39488->10.251.206.17:1026 (ESTABLISHED)```

wallnerryan avatar Aug 01 '17 19:08 wallnerryan

I wrote a quick fix: https://github.com/minyk/etcd-mesos/commit/2b54e65119aed4c8ea5112c8f3927fab80194672 Just add client.Close(), then connection numbers are very stable(~20) for now.

minyk avatar Aug 28 '17 07:08 minyk