zraft_lib
zraft_lib copied to clipboard
Should a light session be tolerant to failures?
I'm reading and writing using a light_session -- the leader is node_1 and everything is zipping along fine -- reads and writes. However, if node_1 is partitioned and then crashes, I see the leader transition to node_3, but writes using the light_session do not seem to complete? When I debug the messages being sent, no matter which request I issue the read from, I always see the request go to node_1, which was that last known leader, who is no longer online.
The documentation seems to indicate that, with the light session configured to know about all peers, that a request would, if timed out, be rerouted to another node in the cluster. Is this how it should work?
Seems like not a lot of issues get responded to in this repository (open since 2015) - maybe you’d be better off trying https://github.com/rabbitmq/ra https://github.com/rabbitmq/ra ?
On 6 May 2019, at 22:46, Christopher S. Meiklejohn [email protected] wrote:
I'm reading and writing using a light_session -- the leader is node_1 and everything is zipping along fine -- reads and writes. However, if node_1 is partitioned and then crashes, I see the leader transition to node_3, but writes using the light_session do not seem to complete? When I debug the messages being sent, no matter which request I issue the read from, I always see the request go to node_1, which was that last known leader, who is no longer online.
The documentation seems to indicate that, with the light session configured to know about all peers, that a request would, if timed out, be rerouted to another node in the cluster. Is this how it should work?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dreyk/zraft_lib/issues/33, or mute the thread https://github.com/notifications/unsubscribe-auth/AHUCR5XEKQLVMZLZPDSVLWDPUCRKFANCNFSM4HLDVDLA.
We're testing a variety of software as part of a research project, and this was one of the libraries we selected.
@cmeiklejohn Yes. light_session should be tolerated to a node fails. It knows about all peers and should try to use other nodes to get a new leader.