etcd-mesos icon indicating copy to clipboard operation
etcd-mesos copied to clipboard

add tests for reseed logic

Open spacejam opened this issue 10 years ago • 3 comments
trafficstars

tests should cover:

  • [ ] ensuring that the node with the highest raft index is chosen
  • [ ] ensuring that when the first node fails to come online, that the next node is attempted
  • [ ] ensuring that no nodes are killed unless a node is successfully determined to be a new seed
  • [ ] ensuring that when a new seed is chosen, and the scheduler dies before killing old nodes, that when it comes back it knows to kill the non-healthy stale cluster (it should perform a new reseed, but pick the exact same node as was picked before due to higher raft index)
  • [ ] ensuring that when the scheduler starts and nodes were in livelocked state, that the above logic works

spacejam avatar Sep 09 '15 20:09 spacejam

How is this an issue for newbies? 🤔

pires avatar Apr 22 '17 20:04 pires

Context: back when I wrote this, the codebase had a tendency to break in gross ways when other people touched it, so I made this a newbie ticket to encourage new contributors to learn about what can go wrong, and make their guard rails that would allow them to safely contribute. It's of vital importance when modifying a stateful system to understand the failure modes, and this work will familiarize a newbie with the requisite understanding of non-happy-path scenarios.

Feel free to reach out for guidance on fault injection, safety considerations, context, etc... May be open to $hourly intensive surgery on high-risk paths if you're underwater before June 1.

spacejam avatar Apr 22 '17 20:04 spacejam

I was thinking more about the complexity of adding e2e tests against an existing Mesos cluster. I'm interested in having this so one can have integration w/ Github PRs.

pires avatar Apr 22 '17 20:04 pires