ra icon indicating copy to clipboard operation
ra copied to clipboard

Should Ra return an error while multiple nodes try to leave the cluster?

Open RoadRunnr opened this issue 4 years ago • 5 comments

  • tested on 1.1.8

When multiple nodes try to leave the cluster at once (e.g. when trying to stop all nodes in a unit test), it can happen that the master leaves and another node also tries to leave at almost the same time. When the old master process has already terminated, but no new master has been elected yet, a call to ra:leave_and_delete_server on the local node can return {error, noproc}.

IMHO it should be {error,cluster_change_not_permitted} as long as there is at least one node (e.g. the local node) left in the cluster.

RoadRunnr avatar Feb 02 '21 09:02 RoadRunnr

Sounds good. Ra intentionally limits membership changes to one at a time due to how complex reasoning about multi-member changes are. Some other Raft implementations have chosen to adopt the same limitations.

This is open source software, so please submit a PR.

michaelklishin avatar Feb 02 '21 11:02 michaelklishin

I think this issue should not be a question, maybe I didn't describe the problem sufficiently enough.

When multiple ra nodes try to leave the cluster at about the same time, the result of calling ra:leave_and_delete_server can have the following values:

  • ok
  • {error,cluster_change_not_permitted}
  • {error,noproc}
  • a crash {'EXIT',{normal,{gen_statem,call,[Leader,{leader_call,{command,normal,{'$ra_leave',Node,await_consensus}}},5000]}}}

The {error,noproc} and the crash a clearly abnormal returns.

I would like to submit a PR to fix this bug. But the time required to understand the internals of ra, then fix the bug and possibly write a test case, means that this will happen sometime far, far in the future.

RoadRunnr avatar Feb 02 '21 11:02 RoadRunnr

Some of these come from natural race conditions where a leader is removed and a request is issued to the old leader pid. I suspect retries is the only reasonable way forward here

kjnilsson avatar Feb 02 '21 11:02 kjnilsson

Some of these come from natural race conditions where a leader is removed and a request is issued to the old leader pid. I suspect retries is the only reasonable way forward here

That's what I'm doing in my test suite for all the error cases. However, I would expect that in the noproc and in the crash case, the right thing for the API would be to return {error,cluster_change_not_permitted}. noproc would only make sense in the case where the targeted ServerRef is not running.

RoadRunnr avatar Feb 02 '21 12:02 RoadRunnr

noproc is probably returned when the leader has gone away and the call is redirected to the now gone leader. If this is the case (I haven't validated) then I think noproc is appropriate

kjnilsson avatar Feb 02 '21 12:02 kjnilsson