etcd-mesos
etcd-mesos copied to clipboard
zk reconnection problems due to shortage of file descriptors
W0411 18:52:35.436744 19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server. Backing off for 8 seconds and retrying.
2016-04-11 18:52:43.437144 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
2016-04-11 18:52:43.437294 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
W0411 18:52:43.437451 19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server. Backing off for 8 seconds and retrying.
2016-04-11 18:52:51.437854 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
2016-04-11 18:52:51.438031 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
W0411 18:52:51.438133 19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server. Backing off for 8 seconds and retrying.
2016-04-11 18:52:59.438530 I | Failed to connect to 10.2.0.5:2181: dial tcp 10.2.0.5:2181: too many open files
2016-04-11 18:52:59.438701 I | Failed to connect to 10.2.0.7:2181: dial tcp 10.2.0.7:2181: too many open files
W0411 18:52:59.438787 19305 zk.go:154] Failed to configure cluster for new instance: zk: could not connect to a server. Backing off for 8 seconds and retrying.
E0411 18:53:07.439019 19305 scheduler.go:396] Failed to persist reconciliation info: zk: could not connect to a server
I0411 18:53:07.439247 19305 scheduler.go:676] running instances: 3 desired: 3 offers: 0
I don't have the full logs, they scrolled off the edge of my console. Will update if I'm able to reproduce. Noticed this after many suspend/resume cycles of the dev laptop running etcd-mesos and the mesos/zk cluster it was deployed to.
Could this be a file-description (connection?) leak?
could be. hard to repro