consul
consul copied to clipboard
Leadership transfer cmd
Description
This is an attempt to implement a command that would trigger a raft leadership transfer. The idea about this trace back to a customer issue where we needed to trigger leader transfer to be able to upgrade a leader node.
This dependent to a change in raft that fix some of the gaps in leadership transfer, those gaps would make the transfer fail. https://github.com/hashicorp/raft/pull/487
I made an attempt in naming the sub command but open to any suggestions.
Testing & Reproduction steps
Added tests:
- RPC call without and with ACL
- API call negative test (with one agent)
- CMD call negative test (with one agent)
@mkeeler I undusted this PR, if you happen to have some spare time. I had to create a new one and close https://github.com/hashicorp/consul/pull/12182 because of some git weirdness but I went though all your comments
Could we add one more test for OSS to verify the ACL enforcement and one enterprise test to verify that validateEnterpriseToken enforcement
Added an OSS test I will add one for enterprise after it get merged.
Instead of structs.DCSpecificRequest could we use a new request type which also includes an optional parameter representing the raft.ServerID of a target server to transfer leadership to.
Added the ID to the request and threaded through to the API and the CLI.
We could call LeadershipTransfer on Raft itself. This other function is no longer necessary as every 1.9+ cluster will unconditionally use leadership transfers for autopilot purposes so we can rely on it being available here.
I changed attemptLeadershipTransfer
to be able to call the raft API with or without an ID. I left the version check, I can remove it if you think it's not useful anymore
@dhiaayachi : what are your thoughts on aligning the naming of this command more with the existing Vault name for similar functionality: step-down
rather than leader-transfer
?
Vault provides an operator-level CLI command and HTTP API endpoint for forcing the current leader to gracefully step down. Someone familiar with this capability in Vault might look for something with the same name in Consul (internally, at least one person has asked if Consul had a similar "step down" capability).
That said, I can see the advantages of consul operator raft leader-transfer
instead of consul operator raft step-down
. Semantically, it's not raft that's stepping down... the leader is stepping down. Perhaps we could keep the name leader-transfer
but, in the docs, mention that this is analogous to vault operator step-down
in Vault?
Just wanted to share these thoughts with you to consider.
Side question: when running this command, can the leader immediately be re-elected? Or is a different server agent always elected (if available)?
@mkeeler @jkirschner-hashicorp I did some changes to this based on your comments. Please take a look when you have time.
@mkeeler I implemented this as a GRPC endpoint. I tried to make it in a way that we can move all the operators endpoints to it but it's my first attempt implementing such an endpoint in Consul, let me know if I missed some spots.
The leadership transfer command will be available starting in Consul 1.15.0!