etcd-mesos zk reconnection problems due to shortage of file descriptors

zk reconnection problems due to shortage of file descriptors

Open jdef opened this issue 9 years ago • 2 comments

W0411 18:52:35.436744   19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server.  Backing off for 8 seconds and retrying.
2016-04-11 18:52:43.437144 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
2016-04-11 18:52:43.437294 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
W0411 18:52:43.437451   19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server.  Backing off for 8 seconds and retrying.
2016-04-11 18:52:51.437854 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
2016-04-11 18:52:51.438031 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
W0411 18:52:51.438133   19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server.  Backing off for 8 seconds and retrying.
2016-04-11 18:52:59.438530 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
2016-04-11 18:52:59.438701 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
W0411 18:52:59.438787   19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server.  Backing off for 8 seconds and retrying.
E0411 18:53:07.439019   19305 scheduler.go:396] Failed to persist reconciliation info: zk: could not connect to a server
I0411 18:53:07.439247   19305 scheduler.go:676] running instances: 3 desired: 3 offers: 0

I don't have the full logs, they scrolled off the edge of my console. Will update if I'm able to reproduce. Noticed this after many suspend/resume cycles of the dev laptop running etcd-mesos and the mesos/zk cluster it was deployed to.

Apr 12 '16 23:04 jdef

Could this be a file-description (connection?) leak?

Apr 22 '17 23:04 pires

could be. hard to repro

Apr 26 '17 19:04 jdef

etcd-mesos etcd-mesos copied to clipboard

zk reconnection problems due to shortage of file descriptors

etcd-mesos
etcd-mesos copied to clipboard