etcd3 icon indicating copy to clipboard operation
etcd3 copied to clipboard

Election example from doc elects 2 leaders after etcd restart, starts multiple workers, or fails to elect a leader

Open tcollinsworth opened this issue 3 years ago • 0 comments

Documented election example

Gist of documented example with seemingly minor benign changes so it will execute.

Versions

  • Ubuntu 20.04.3 LTS
  • node v16.13.1
  • Etcd3 1.1.0
  • etcd 3.5.1
  • docker image bitnami/etcd tag:latest imageId:57a06bf0a041
  • docker 20.10.12

How to reproduce

  • 'docker pull/run, see below'
  • start 2 instances of example node.js code and wait for election, each has a unique UUID, one will be elected and do work
  • 'docker ps to get CONTAINER ID'
  • 'docker stop CONTAINER ID' and wait for revoke event and work stops
  • depending on how long etcd is down, the symptoms vary, see below
  • 'docker start CONTAINER ID' wait for election and both get elected and do work

Symptoms

The symptoms after brief etcd outage < 10 seconds Restart etcd after EtcdLeaseInvalidError

  • only one leader is elected, but multiple workers are doing work on the leader - what?

The symptoms after a medium etcd outage ~30 sesconds Restart etcd after EtcdLeaseInvalidError and GRPCUnavailableError

  • both node.js processes are leaders and multiple workers are doing work

The symptoms after a long etcd outage, ~few minutes Restart etcd after EtcdLeaseInvalidError and GRPCUnavailableError and BrokenCircuitError and GRPCResourceExhastedError

  • both node.js processes spew errors endlessly tripping circuit breaker and never recover

docker pull/run docker run -d --publish 2379:2379 --publish 2380:2380 --env ALLOW_NONE_AUTHENTICATION=yes --env ETCD_ADVERTISE_CLIENT_URLS=http://etcd-server:2379 bitnami/etcd:latest

tcollinsworth avatar Jan 13 '22 01:01 tcollinsworth