HDDS-6743. Specify leader node for OM failover
What changes were proposed in this pull request?
Currently if clients first connect to a follower OM, the response show the OM is not leader but didn't specify the real Leader node.
This ticket is to let the reply to contains the Leader OM so that clients can connect to Leader node more conveniently.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6743
How was this patch tested?
unit test
@adoroszlai @ChenSammi Could you help to review this PR?
thanks @symious for the work! i have a patch for this issue earlier #2765
@hanishakoneru left a comment to explain why this should not be done. https://github.com/apache/ozone/pull/2765#issuecomment-952091699
i suggest we should achieve agreement on this issue first , and then go ahead.
@JacksonYao287 Sure, thanks for the review.
In https://github.com/apache/ozone/pull/2765#issuecomment-952091699, the concern I think is the misconfig of client side might trigger some dead loops, so an address was prefered to add instead of only OMNodeId.
In the latest commit of this PR, the OMNotLeaderException includes the following information:
- raftPeerId
- raftLeaderId
- raftLeaderAddress
An example of this exception message would be
org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException: OM:omNode-3 is not the leader. Suggested leader is OM:omNode-1/127.0.0.1
, when client received this exception, he should try the address first, only if the address is empty should he try to check the raftLeaderId we suggested.
@symious is this PR still active? If not we can close it.
Just saw this PR, recently I've also been researching some issue related to the out-of-sync mapping between client and server. just mark myself here in order to follow up the latest change of this PR! thanks all!
@kerneltime Still active I think, could you help to review the PR? I will resolve the conflictions later.
thanks @symious will get this reviewed
cc @duongkame @aswinshakil @tanvipenumudy