etcd-mesos
etcd-mesos copied to clipboard
self-healing etcd on mesos!
### Problem Currently, when this framework is deployed, the following may happen: 1. This framework hasn't been deployed before, hence a clean start is performed and everything just works. 2....
According to the [Mesos High-Availability Framework guide](http://mesos.apache.org/documentation/latest/high-availability-framework-guide/), a framework should run an odd number (n >=3) of scheduler instances in order to provide tolerance to scheduler failures. One should implement...
Lock (https://github.com/mesosphere/etcd-mesos/blob/master/scheduler/scheduler.go#L342) should only occur if a previous node (cluster?) was able to instantiate first, otherwise it's needlessly becoming unavailable.
This becomes a bug with 0.26, but rather than changing state.json->state we should just use the json that is persisted in ZK.
Currently, etcd-mesos cannot survive a total cluster loss. We should provide optional automatic backups and automated(operator triggered)/automatic restoration.
Support a framework principal and path to secret file to support basic Mesos Framework authentication.
- interested in fixes from #93
https://issues.apache.org/jira/browse/MESOS-7375
Via slack convo: > smoke tests: (a) start up an etcd cluster; make sure that the number of expected servers is brought online (b) kill a server, make sure that...