etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Election.Observe may report the wrong leader

Open jcferretti opened this issue 8 months ago • 7 comments

Bug report criteria

  • [X] This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
  • [X] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
  • [X] You have read the etcd bug reporting guidelines.
  • [X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

The implementation of observe does a get on the election key (prefix) https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/concurrency/election.go#L178 and if that get returns zero kvs, it does a watch: https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/concurrency/election.go#L190

The watch may return multiple PUT events, in which case the first one is returned (note the break statement in the if body matching a PUT event type): https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/concurrency/election.go#L201

It seems to me returning the first PUT event can be wrong if there are multiple PUTs in the returned wr.Events, unless I am misunderstanding this (which is possible, since there is a fair amount of complexity here). So please bear with me, and help me understand if this is not broken.

I think instead of returning the first one, the PUT event with the lowest create revision should be returned, otherwise we risk returning the wrong leader. Returning the event with the lowest create revision would be consistent with what is done if the initial get is not empty. When the initial get is not empty, the first KV in the get range is returned https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/concurrency/election.go#L210

since the get was done with options for ascending order of creation https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/concurrency/election.go#L178 https://github.com/etcd-io/etcd/blob/5c3190903fa1115b083e22fe55f86e3a78d82f36/client/v3/op.go#L462

The only way I could see ignoring any other PUTs but the first not being an issue is if there was a guarantee that there will never be more than one PUT event in the returned wr.Events at that point in the code. I don't believe that is guaranteed, since watches are eventual consistent and there can be a network issue between the first get and this subsequent watch; if at the time of the first get there is no election leader (I don't mean etcd cluster leader, I mean elected candidate in the election API), and then a transient network error occurs, a retry of the watch may succeed at a time when there are several candidates already registered for the election, which I believe would make the watch return multiple PUTs.

What did you expect to happen?

N/A

How can we reproduce it (as minimally and precisely as possible)?

N/A

Anything else we need to know?

N/A

Etcd version (please run commands below)

3.5.12

Etcd configuration (command line flags or environment variables)

N/A

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

No response

jcferretti avatar Jun 13 '24 03:06 jcferretti