dynomite
dynomite copied to clipboard
Q:How dynomite handle the data conflict resolve
how dynomite handle the data conflicts between different cluster in other region?
Dynomite reads from the local data center, and writes to all data centers (regions). To manage data accuracy inside a data center, you can configure the consistency level. We are actively adding portions for anti-entropy/reconciliation.
@ipapapa - Just to clarify - this would effectively translate to a "last write wins" strategy for cross-datacenter conflicts?
Of course, this would have the caveat that latency of replication could make different writes look like the "last write" in each DC. As in the following example diagram, showing the value of a single key over time, as seen by two nodes in different datacenters, which each receive a SET
to the same key, nearly "simultaneously" (within a time period less than the latency of replication):
(in node a1) (in node b1)
-------------------------------
SET key FOO
(key == FOO)\ SET key BAR
... \ /(key == BAR)
\ / ...
\/
/\
/ \
/ \
/ (key == FOO)
(key == BAR) ...
...
It seems like this certainly could still be useful for some applications, but I must admit that I was hopeful for an application-facing cross-datacenter conflict-resolution mechanism when I read that Dynomite was inspired by the Dynamo whitepaper, which:
makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use
It seems like perhaps that part of the Dynamo concept was abandoned in favor of using the Redis/Memcached protocol exclusively (which to my knowledge, contain no such conflict resolution mechanisms).
@jemc That is correct, the "last write wins". There are limitations as you mention, but it depends on the use case, and we do not have such use case right now. There are thoughts of extending the current implementation by adding timestamps for conflict resolution and reconciliation. Please feel free to contribute as well.
There are thoughts of extending the current implementation by adding timestamps for conflict resolution and reconciliation. Please feel free to contribute as well.
I'd definitely be interested to at least hear more about some of these design discussions and evaluate whether they would meet the needs of my particular application. If so, I'd be happy to help make the idea a reality through assisting with the fleshing-out and the implementation.
Is there some public medium like a mailing list archive or other issue ticket where these ideas have been discussed, so I can read and understand the proposed direction more thoroughly?
@jemc we are in the progress to implement the recording of timestamps along with the data. This will help Dynomite to address the read conflicts in some cases (however there is a trade-off for latency). Also timestamps can help us during the data repair process or data reconciliation. This will bring us to the same level as Cassandra in term of data reconciliation.
In the future, we might extend Dynomite to support MVCC and this can help greatly to address your concern. However, the trade-off is that we can't sustain our development to support many data storages such RockDb, LMDB, ForestDDs, etc. Furthermore, using in-memory Redis storage, it would cost us a great deal of memory storage to support this feature while none of our current internal users is asking.
@timiblossom when you say timestamp I'm assuming you're talking about using a wall clock timestamp to determine order of events - have you considered using some form of logical clock instead (like a vector clock)?
@jemc yes, I meant the real clock. For using logical, beside using that in gossip, I am still not sure how to use that to clock the incoming requests for timestamps.
It would require something like each node tracking the revision number for each key, incrementing when a request to set/mutate that key arrives. For a vector clock in particular, the revision number would be multi-dimensional with up to N values for each key, where N is the number of nodes that accept writes for that key.
@jemc that would require keeping a state on each node, hence making Dynomite from a proxy to a stateful system. What if the node dies and the state is lost?
If you were keeping a vector clock in memory without persistence, and the node dies, it could try to retrieve the latest vector clock info through communication with the other nodes. If it couldn't receive the latest clock info for a given key before it received a write for that key, it would plunge ahead with an empty clock - if this was later the cause of a descendency conflict, it would be treated like any other descendency conflict, subject to the same resolution mechanism (whether automated or application-facing).
@jemc We expect to add wall clocks in the beginning and in the design we can consider extensibility to logical clocks. In fact we have added wall clocks by modifying Redis itself and we used that to evaluate a PoC for anti-entropy. However, we deliberated on whether (a) that will put us in a chicken and egg situation with the Redis upgrades; (b) will not be extensible to logical clocks; and (c) will not be portable to other storage engines like https://github.com/Netflix/dynomite/issues/254 or https://github.com/Netflix/dynomite/issues/310 (hence diverting from the fact that Dynomite is a stable proxy layer). The advantage of this approach was that it allowed us to keep timestamps inside the values of any data structure (list, hash, set, etc.) without having to perform expensive (de)serialization on Dynomite.
The current thought process is for Dynomite to prepend a timestamp on the value along with some delimiters. In the short term, we are focusing on making Dynomite "multi-threaded" and adding persistence storage engines like RocksDB or others. The first will solve several limitations of the single-threaded nature of Dynomite and allow us to do some of the aforementioned computations.
We are also looking forward from the OSS users to drive some features. Given the scale of Dynomite in the open source, we are still puzzled on what is the best way to draft designs with the Dynomite community. We started by using the Help Needed
label. Alternatively we can have a mailing list. I was wondering if there is any other better way or paved path.