curator icon indicating copy to clipboard operation
curator copied to clipboard

Vector clock and conflict resolution for Riak

Open bmorton opened this issue 11 years ago • 5 comments

It looks like right now Curator only supports last write wins for Riak. I'd like to add (probably at the Repository level) the ability to resolve conflicts when LWW is turned off, which will be the default in Riak 2.0.

This means support for:

  • Passing around a vector clock so when you pull down an object, modify it, and save it, we use the vector clock from the original fetch.
  • At the repository level, defining a way to resolve conflicts

Thoughts on these changes? AFAIK, these are pretty Riak-specific, but are pretty important in the way you're supposed to use Riak, especially with 2.0 coming.

bmorton avatar Feb 03 '14 18:02 bmorton

I think this makes sense, but it would be very helpful for me to see some code. Maybe you can spike it out so we can see what it will look like?

pgr0ss avatar Feb 04 '14 18:02 pgr0ss

Vector clocks are the first part before we dive into conflict resolution. On a side note, I think the term "version vector" is a more generalized term for Riak's implementation that uses vector clocks. Thoughts on that name?

I'm a bit conflicted on where the version vector stuff should live. It seems like something that is purely for persistence, so having the attribute on the model for it feels out of place.

An alternative to that would be to have the repository keep a hash containing retrieved objects and their corresponding version vector. Since the Repository is a singleton, there's not a great way to keep that hash from bloating quickly. If the Repository weren't a singleton, we could easily get rid of the VersionVectorMap at the end of that object's lifecycle.

Here's an implementation that would require you to perform actions within a "context" (or "session" or "transaction," but those terms have a lot of meaning already). During that context, we'd keep track of a VersionVectorMap and clear that up at the end of the block.

class Thing
  include Curator::Model
  attr_accessor :id, :stuff
end

class ThingRepository
  include Curator::Repository
end

ThingRepository.context do
  t = ThingRepository.find_by_id("LnwD3PXpmoSyp07PCW3yhVOlGxY")
  t.stuff = "new"
  ThingRepository.save(t) # persists with the version vector from when `t` was retrieved
end

Rough code: https://github.com/bmorton/curator/compare/version-vector-spike

A much simpler implementation would be to make version_vector an attribute on the Model and make sure we don't persist it in the normal serialized data. I think this would look similar to the version stuff that lives there today.

bmorton avatar Feb 13 '14 08:02 bmorton

As far as the actual conflict resolution stuff is concerned, I'm exploring how to use CRDTs for this. I'm not sure yet how the model will be configured to represent itself as a CRDT (or a collection of various CRDTs) and how objects will be merged.

Some of this changes quite a bit with Riak 2.0 where conflicts can be resolved by Riak itself if you're using supported data types, so that may clean things up quite a bit. If that's the case, we'd just need a set of supported data types and make sure things get serialized properly when saved by the Repository.

bmorton avatar Feb 13 '14 08:02 bmorton

Note to self: have to deal with version vectors with deletions as well.

bmorton avatar Feb 13 '14 09:02 bmorton

I vote for adding it to the model. While it does bleed some persistence into the model, I think it's a simpler solution.

pgr0ss avatar Feb 22 '14 00:02 pgr0ss