olric
olric copied to clipboard
Read Repair vs healing to satisfy ReplicaCount
During an upgrade of my application, which uses olric embedded, all replicas get restarted in quick succession. Most keys will never be read, or will be read very infrequently ... which with read repair means that after an upgrade of my application the cache will be immediately "empty".
That behavior means that olric doesn't actually provide useful functionality for my use case - unless I add code that reads the entire keyspace after startup to effectively repair the cache redundancy before letting Kubernetes know that the Pod is started successfully.
I believe that olric could do this internally more efficiently and I also believe that such a functionality would be generally useful:
From the documentation it seems that olric could "easily" know that a part of the keyspace doesn't satisfy the requested ReplicaCount and actively transfer the data to the newly joined member to repair the cache in case a node restarts.
So this is a request for:
- when joining a cluster, ask it to transfer some data to the new node to satisfy ReplicaCount
- provide an API to detect when this initial sync is finished so that the embedding application can communicate to the Kubernetes API when it is safe to continue with the rollout
- detect node joins/departures fast enough to make such a rollout fast enough
- useful node identity in the context of a Kubernetes cluster where IP-addresses are basically useless
Having same issue here. I have to to do a full read repair of all keys to trigger data transfer when new node joins. Do we know what version will we address this issue?
Having same issue
Hi all,
I'm aware of this is one of the most wanted features among the users. I started working on a solution based on a technique called vector clock. It may be ready for initial tests in a couple of months. I plan to make it production-ready by the end of this year.
For anyone who is curious about version vectors, here is some info:
- https://riak.com/posts/technical/vector-clocks-revisited/index.html?p=9545.html
- https://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/
- https://en.wikipedia.org/wiki/Vector_clock
- https://github.com/hazelcast/hazelcast/blob/master/docs/design/partitioning/03-fine-grained-anti-entropy-mechanism.md
- https://people.cs.rutgers.edu/~pxk/417/notes/logical-clocks.html