etcd-mesos issues

[PROPOSAL] Add framework deployment collision handling strategy

1

### Problem Currently, when this framework is deployed, the following may happen: 1. This framework hasn't been deployed before, hence a clean start is performed and everything just works. 2....

pires

enhancement

[PROPOSAL] Allow for multiple scheduler instances

1

According to the [Mesos High-Availability Framework guide](http://mesos.apache.org/documentation/latest/high-availability-framework-guide/), a framework should run an odd number (n >=3) of scheduler instances in order to provide tolerance to scheduler failures. One should implement...

pires

enhancement

stricter threshold for lock

1

Lock (https://github.com/mesosphere/etcd-mesos/blob/master/scheduler/scheduler.go#L342) should only occur if a previous node (cluster?) was able to instantiate first, otherwise it's needlessly becoming unavailable.

spacejam

support external URL's for etcd, etcdctl, executor bins

spacejam

enhancement

proxy should use zk state, not state.json

1

This becomes a bug with 0.26, but rather than changing state.json->state we should just use the json that is persisted in ZK.

spacejam

enhancement

support full-cluster loss with automated backup and recovery

Currently, etcd-mesos cannot survive a total cluster loss. We should provide optional automatic backups and automated(operator triggered)/automatic restoration.

spacejam

enhancement

Support Framework Authentication

Support a framework principal and path to secret file to support basic Mesos Framework authentication.

JohnOmernik

enhancement

rebuild etcd-mesos dcos image and publish new version to multiverse

2

- interested in fixes from #93

jdef

priority/P1

add flag to enable etcd to receive offers on clusters where all nodes have GPU resources

1

https://issues.apache.org/jira/browse/MESOS-7375

jdef

enhancement

Automate steps for e2e testing

2

Via slack convo: > smoke tests: (a) start up an etcd cluster; make sure that the number of expected servers is brought online (b) kill a server, make sure that...

jdef

etcd-mesos
etcd-mesos copied to clipboard

Metadata

[PROPOSAL] Add framework deployment collision handling strategy

[PROPOSAL] Allow for multiple scheduler instances

stricter threshold for lock

support external URL's for etcd, etcdctl, executor bins

proxy should use zk state, not state.json

support full-cluster loss with automated backup and recovery

Support Framework Authentication

rebuild etcd-mesos dcos image and publish new version to multiverse

add flag to enable etcd to receive offers on clusters where all nodes have GPU resources

Automate steps for e2e testing

← Metadata

Owner

Metadata

etcd-mesos etcd-mesos copied to clipboard

Metadata

← Metadata

Owner

Metadata

etcd-mesos
etcd-mesos copied to clipboard