curator
curator copied to clipboard
Vector clock and conflict resolution for Riak
It looks like right now Curator only supports last write wins for Riak. I'd like to add (probably at the Repository level) the ability to resolve conflicts when LWW is turned off, which will be the default in Riak 2.0.
This means support for:
- Passing around a vector clock so when you pull down an object, modify it, and save it, we use the vector clock from the original fetch.
- At the repository level, defining a way to resolve conflicts
Thoughts on these changes? AFAIK, these are pretty Riak-specific, but are pretty important in the way you're supposed to use Riak, especially with 2.0 coming.
I think this makes sense, but it would be very helpful for me to see some code. Maybe you can spike it out so we can see what it will look like?
Vector clocks are the first part before we dive into conflict resolution. On a side note, I think the term "version vector" is a more generalized term for Riak's implementation that uses vector clocks. Thoughts on that name?
I'm a bit conflicted on where the version vector stuff should live. It seems like something that is purely for persistence, so having the attribute on the model for it feels out of place.
An alternative to that would be to have the repository keep a hash containing retrieved objects and their corresponding version vector. Since the Repository
is a singleton, there's not a great way to keep that hash from bloating quickly. If the Repository
weren't a singleton, we could easily get rid of the VersionVectorMap
at the end of that object's lifecycle.
Here's an implementation that would require you to perform actions within a "context" (or "session" or "transaction," but those terms have a lot of meaning already). During that context, we'd keep track of a VersionVectorMap
and clear that up at the end of the block.
class Thing
include Curator::Model
attr_accessor :id, :stuff
end
class ThingRepository
include Curator::Repository
end
ThingRepository.context do
t = ThingRepository.find_by_id("LnwD3PXpmoSyp07PCW3yhVOlGxY")
t.stuff = "new"
ThingRepository.save(t) # persists with the version vector from when `t` was retrieved
end
Rough code: https://github.com/bmorton/curator/compare/version-vector-spike
A much simpler implementation would be to make version_vector
an attribute on the Model
and make sure we don't persist it in the normal serialized data. I think this would look similar to the version stuff that lives there today.
As far as the actual conflict resolution stuff is concerned, I'm exploring how to use CRDTs for this. I'm not sure yet how the model will be configured to represent itself as a CRDT (or a collection of various CRDTs) and how objects will be merged.
Some of this changes quite a bit with Riak 2.0 where conflicts can be resolved by Riak itself if you're using supported data types, so that may clean things up quite a bit. If that's the case, we'd just need a set of supported data types and make sure things get serialized properly when saved by the Repository
.
Note to self: have to deal with version vectors with deletions as well.
I vote for adding it to the model. While it does bleed some persistence into the model, I think it's a simpler solution.