(For discussion) Support dynamic cluster reconfiguration
Support for cluster reconfiguration using the algorithm in the Raft paper. For discussion; I don't think we should merge just yet.
There are also a few bug fixes in here, which I may pull out and propose separately.
The biggest problem is to figure out how to test this. I have some integration tests which use a key-value store that I've built using Barge; I bring up the actual services, kill and (re)start them and check it behaves as expected: https://github.com/justinsb/cloudata/blob/master/cloudata-keyvalue/src/test/java/com/cloudata/keyvalue/ClusterTest.java
Is creating an integration test the best option here?
This is a rather impressive change and given my low familiarity with the code base I would not some time to digest it.
I could help with writing some tests expressing the high-level specification of this feature given I have started adding "integration" or "medium-sized" tests to the code. What do you think these tests should look like? What are the "specs" or requirements for membership change?
Thanks and good to meet you abailly. I tried to implement the spec as described in the Raft paper. I do have some integration tests here: https://github.com/justinsb/cloudata/blob/master/cloudata-keyvalue/src/test/java/com/cloudata/keyvalue/ClusterTest.java but they leverage the stuff I'm building on top of Barge.
The spec is essentially: As long as there is a quorum still alive, we should be able to reconfigure to achieve a new membership. I basically just try shutting down members of the cluster and starting new ones, and checking that things survive.
My test can't be merged into barge itself, as it drags in a whole lot of external dependencies. But, if we get your integration tests merged, then hopefully I can refactor the existing test based on what you've done, and we should be in good shape.
So, #53 first, and then hopefully testing here will be much easier.
Good to meet you too!
Yes, I have skimmed through your test but it was not entirely clear to me what you were testing. Might missing some context. I will have a look at the changes tomorrow and might at least propose some more formal improvements (eg. naming, clarity of the code...) if you think this is valuable at that stage.
Let's get #53 merged, and then I can rework my test using your much simpler model! Hopefully all will then be clear!
Catching up. Looking at this one now (and Sunday). Feedback to follow.