nats-streaming-server icon indicating copy to clipboard operation
nats-streaming-server copied to clipboard

Cluster change resulting in thousands of messages being replaced starting at seqId=1

Open AaronFriel opened this issue 4 years ago • 15 comments

It appears a network partition or extremely high latency resulted in the leader failing to reach a follower. (I think that the CPU load on a server was near 100% and nats-streaming was failing to get time slices, which resulted in a degradation similar to a network partition.)

nats-streaming-1: [1] 2020/05/31 19:02:15.974370 [WRN] STREAM: raft: Failed to contact nats-streaming-0 in 5.000272385s
nats-streaming-1: [1] 2020/05/31 19:04:07.277293 [WRN] STREAM: raft: Failed to contact nats-streaming-0 in 5.000305251s
nats-streaming-1: [1] 2020/05/31 19:04:12.030348 [WRN] STREAM: raft: Failed to contact nats-streaming-0 in 9.754040241s
nats-streaming-1: [1] 2020/05/31 19:04:12.626648 [INF] STREAM: raft: aborting pipeline replication to peer {Voter nats-streaming-0 namespace.nats-streaming-0.namespace}
nats-streaming-1: [1] 2020/05/31 19:04:15.104260 [ERR] STREAM: raft: Failed to AppendEntries to {Voter nats-streaming-0 namespace.nats-streaming-0.namespace}: nats: timeout
nats-streaming-1: [1] 2020/05/31 19:04:17.752662 [ERR] STREAM: raft: Failed to heartbeat to namespace.nats-streaming-0.namespace: nats: timeout

The above messages repeated for about 45 seconds until the pattern changed with a message logged from nats-streaming-0:

nats-streaming-0: [ERR] STREAM: raft-net: Failed to flush response: write to closed conn

Then, nats-streaming-0 tried to start a leadership election, and the partition began healing. 1 and 2 both rejected the vote request and stated the leader is currently nats-streaming-1:

2020-05-31T19:05:08.890568417Z nats-streaming-0 [1] 2020/05/31 19:05:08.890506 [DBG] STREAM: raft-net: namespace.nats-streaming-0.namespace accepted connection from: namespace.nats-streaming-1.namespace
2020-05-31T19:05:08.890668504Z nats-streaming-2 [1] 2020/05/31 19:05:08.890481 [WRN] STREAM: raft: Rejecting vote request from namespace.nats-streaming-0.namespace since we have a leader: namespace.nats-streaming-1.namespace
2020-05-31T19:05:08.928478050Z nats-streaming-1 [1] 2020/05/31 19:05:08.928281 [WRN] STREAM: raft: Rejecting vote request from namespace.nats-streaming-0.namespace since we have a leader: namespace.nats-streaming-1.namespace
2020-05-31T19:05:10.136827516Z nats-streaming-0 [1] 2020/05/31 19:05:10.136465 [DBG] STREAM: raft-net: namespace.nats-streaming-0.namespace accepted connection from: namespace.nats-streaming-1.namespace

Then something strange occurs, and nats-streaming-0 insists it has a newer term than 1, and ends up taking over!

2020-05-31T19:05:10.137437331Z nats-streaming-1 [1] 2020/05/31 19:05:10.137275 [ERR] STREAM: raft: peer {Voter nats-streaming-0 namespace.nats-streaming-0.namespace} has newer term, stopping replication
2020-05-31T19:05:10.137492052Z nats-streaming-1 [1] 2020/05/31 19:05:10.137350 [INF] STREAM: raft: Node at namespace.nats-streaming-1.namespace [Follower] entering Follower state (Leader: "")
2020-05-31T19:05:10.137499170Z nats-streaming-1 [1] 2020/05/31 19:05:10.137362 [INF] STREAM: server lost leadership, performing leader stepdown actions
2020-05-31T19:05:10.137704041Z nats-streaming-1 [1] 2020/05/31 19:05:10.137635 [INF] STREAM: raft: aborting pipeline replication to peer {Voter nats-streaming-2 namespace.nats-streaming-2.namespace}
2020-05-31T19:05:10.138956578Z nats-streaming-1 [1] 2020/05/31 19:05:10.138856 [INF] STREAM: finished leader stepdown actions
2020-05-31T19:05:15.351767734Z nats-streaming-0 [1] 2020/05/31 19:05:15.351592 [WRN] STREAM: raft: Election timeout reached, restarting election
2020-05-31T19:05:15.351924160Z nats-streaming-0 [1] 2020/05/31 19:05:15.351839 [INF] STREAM: raft: Node at namespace.nats-streaming-0.namespace [Candidate] entering Candidate state in term 276802
2020-05-31T19:05:15.359643816Z nats-streaming-1 [1] 2020/05/31 19:05:15.358527 [DBG] STREAM: raft: lost leadership because received a requestvote with newer term
2020-05-31T19:05:15.373592387Z nats-streaming-0 [1] 2020/05/31 19:05:15.372560 [DBG] STREAM: raft: Votes needed: 2
2020-05-31T19:05:15.373634865Z nats-streaming-0 [1] 2020/05/31 19:05:15.372594 [DBG] STREAM: raft: Vote granted from nats-streaming-1 in term 276802. Tally: 1
2020-05-31T19:05:15.373642009Z nats-streaming-0 [1] 2020/05/31 19:05:15.372603 [DBG] STREAM: raft: Vote granted from nats-streaming-0 in term 276802. Tally: 2
2020-05-31T19:05:15.373647927Z nats-streaming-0 [1] 2020/05/31 19:05:15.372651 [INF] STREAM: raft: Election won. Tally: 2
2020-05-31T19:05:15.373653817Z nats-streaming-0 [1] 2020/05/31 19:05:15.372673 [INF] STREAM: raft: Node at namespace.nats-streaming-0.namespace [Leader] entering Leader state
2020-05-31T19:05:15.373660086Z nats-streaming-0 [1] 2020/05/31 19:05:15.372786 [INF] STREAM: raft: Added peer nats-streaming-1, starting replication
2020-05-31T19:05:15.374778622Z nats-streaming-0 [1] 2020/05/31 19:05:15.374665 [INF] STREAM: raft: Added peer nats-streaming-2, starting replication
2020-05-31T19:05:15.376022408Z nats-streaming-2 [1] 2020/05/31 19:05:15.375459 [WRN] STREAM: raft: Rejecting vote request from namespace.nats-streaming-0.namespace since we have a leader: namespace.nats-streaming-1.namespace
2020-05-31T19:05:15.423738276Z nats-streaming-0 [1] 2020/05/31 19:05:15.423530 [INF] STREAM: server became leader, performing leader promotion actions
2020-05-31T19:05:15.423893835Z nats-streaming-0 [1] 2020/05/31 19:05:15.423819 [INF] STREAM: raft: pipelining replication to peer {Voter nats-streaming-1 namespace.nats-streaming-1.namespace}
2020-05-31T19:05:15.424079954Z nats-streaming-0 [1] 2020/05/31 19:05:15.424006 [INF] STREAM: raft: pipelining replication to peer {Voter nats-streaming-2 namespace.nats-streaming-2.namespace}
2020-05-31T19:05:15.442711747Z nats-streaming-0 [1] 2020/05/31 19:05:15.442530 [INF] STREAM: finished leader promotion actions
2020-05-31T19:05:15.442760165Z nats-streaming-0 [1] 2020/05/31 19:05:15.442559 [INF] STREAM: Streaming Server is ready

At this point, nats-streaming-0 has taken over and begins delivering messages to clients starting with sequence id 1!

2020-05-31T19:05:15.453934238Z nats-streaming-0 [1] 2020/05/31 19:05:15.453720 [TRC] STREAM: [Client:ac43bf10322f-[some-uuid]] Delivering msg to subid=1083, subject=[redacted], seq=1
2020-05-31T19:05:15.454144601Z nats-streaming-0 [1] 2020/05/31 19:05:15.454019 [TRC] STREAM: [Client:6d8ff6f40b18-[some-uuid]] Delivering msg to subid=1082, subject=[redacted], seq=2
2020-05-31T19:05:15.454165273Z nats-streaming-0 [1] 2020/05/31 19:05:15.454081 [TRC] STREAM: [Client:a3b7a1f1e867-[some-uuid]] Delivering msg to subid=1081, subject=[redacted], seq=3

Context:

  • NATS Streaming running via docker.io/nats-streaming:0.17.0
  • NATS running via docker.io/bitnami/nats:2.1.6
  • NATS Streaming is clustered with 3 nodes
  • Subscribers with redelivered messages were in a queue group

Expected behavior

Even in the case of a network partition, a member rejoining the cluster should:

  • Not result in the leader changing
  • Not result in messages that have been previously acked should never be replayed.

Why did nats-streaming-0 start replaying messages from 2019?

Actual behavior

A network partition resulted in an isolated node rejoining and:

  • Forcing the leader to give up its own leadership
  • The current sequence ID was apparently lost, and very old messages were replayed

AaronFriel avatar Jun 01 '20 21:06 AaronFriel