nats-streaming-operator
nats-streaming-operator copied to clipboard
Nats streaming cluster failing on kubernetes
I'm trying settup a Nats Streaming cluster with three nodes on local kubernetes following the operator docs and I'm getting continuos connection fail from same pods.
As you can see, stan-cluster-poc-1
became cluster leader.
[1] 2020/07/06 19:25:28.972786 [INF] STREAM: Starting nats-streaming-server[stan-cluster-poc] version 0.18.0
[1] 2020/07/06 19:25:28.972914 [INF] STREAM: ServerID: S2CN7Xkm9RPjaSmk17QhUu
[1] 2020/07/06 19:25:28.972921 [INF] STREAM: Go version: go1.14.4
[1] 2020/07/06 19:25:28.972924 [INF] STREAM: Git commit: [026e3a6]
[1] 2020/07/06 19:25:29.076044 [INF] STREAM: Recovering the state...
[1] 2020/07/06 19:25:29.086609 [INF] STREAM: No recovered state
[1] 2020/07/06 19:25:29.092785 [INF] STREAM: Cluster Node ID : "stan-cluster-poc-1"
[1] 2020/07/06 19:25:29.092886 [INF] STREAM: Cluster Log Path: /persistence/stan/raft/stan-cluster-poc-1
[1] 2020/07/06 19:25:29.152058 [INF] STREAM: raft: initial configuration: index=0 servers=[]
[1] 2020/07/06 19:25:29.153059 [INF] STREAM: raft: entering follower state: follower="Node at stan-cluster-poc."stan-cluster-poc-1".stan-cluster-poc [Follower]" leader=
[1] 2020/07/06 19:25:29.158236 [DBG] STREAM: Bootstrapping Raft group stan-cluster-poc as seed node
[1] 2020/07/06 19:25:29.166935 [DBG] STREAM: Discover subject: _STAN.discover.stan-cluster-poc
[1] 2020/07/06 19:25:29.166977 [DBG] STREAM: Publish subject: _STAN.pub.stan-cluster-poc.>
[1] 2020/07/06 19:25:29.166982 [DBG] STREAM: Subscribe subject: _STAN.sub.stan-cluster-poc
[1] 2020/07/06 19:25:29.166985 [DBG] STREAM: Subscription Close subject: _STAN.subclose.stan-cluster-poc
[1] 2020/07/06 19:25:29.166988 [DBG] STREAM: Unsubscribe subject: _STAN.unsub.stan-cluster-poc
[1] 2020/07/06 19:25:29.166991 [DBG] STREAM: Close subject: _STAN.close.stan-cluster-poc
[1] 2020/07/06 19:25:29.170852 [INF] STREAM: Message store is RAFT_FILE
[1] 2020/07/06 19:25:29.171036 [INF] STREAM: Store location: /persistence/stan/stan-cluster-poc-1
[1] 2020/07/06 19:25:29.171376 [INF] STREAM: ---------- Store Limits ----------
[1] 2020/07/06 19:25:29.171513 [INF] STREAM: Channels: 100 *
[1] 2020/07/06 19:25:29.171580 [INF] STREAM: --------- Channels Limits --------
[1] 2020/07/06 19:25:29.172067 [INF] STREAM: Subscriptions: 1000 *
[1] 2020/07/06 19:25:29.172346 [INF] STREAM: Messages : 1000000 *
[1] 2020/07/06 19:25:29.172706 [INF] STREAM: Bytes : 976.56 MB *
[1] 2020/07/06 19:25:29.172915 [INF] STREAM: Age : unlimited *
[1] 2020/07/06 19:25:29.173035 [INF] STREAM: Inactivity : unlimited *
[1] 2020/07/06 19:25:29.173215 [INF] STREAM: ----------------------------------
[1] 2020/07/06 19:25:33.103290 [WRN] STREAM: raft: heartbeat timeout reached, starting election: last-leader=
[1] 2020/07/06 19:25:33.103356 [INF] STREAM: raft: entering candidate state: node="Node at stan-cluster-poc."stan-cluster-poc-1".stan-cluster-poc [Candidate]" term=2
[1] 2020/07/06 19:25:33.121018 [DBG] STREAM: raft: votes: needed=1
[1] 2020/07/06 19:25:33.121087 [DBG] STREAM: raft: vote granted: from="stan-cluster-poc-1" term=2 tally=1
[1] 2020/07/06 19:25:33.121244 [INF] STREAM: raft: election won: tally=1
[1] 2020/07/06 19:25:33.121276 [INF] STREAM: raft: entering leader state: leader="Node at stan-cluster-poc."stan-cluster-poc-1".stan-cluster-poc [Leader]"
[1] 2020/07/06 19:25:33.121463 [INF] STREAM: server became leader, performing leader promotion actions
[1] 2020/07/06 19:25:33.147520 [INF] STREAM: finished leader promotion actions
[1] 2020/07/06 19:25:33.147612 [INF] STREAM: Streaming Server is ready
However, it fails on establish connection with one or more nodes depending of cluster nodes number (e.g.):
[1] 2020/07/06 19:34:45.125722 [WRN] STREAM: raft: failed to contact: server-id="stan-cluster-poc-3" time=1.000833233s
[1] 2020/07/06 19:34:46.071788 [WRN] STREAM: raft: failed to contact: server-id="stan-cluster-poc-3" time=1.946740647s
[1] 2020/07/06 19:34:46.413845 [ERR] STREAM: raft: failed to heartbeat to: peer=stan-cluster-poc."stan-cluster-poc-3".stan-cluster-poc error="nats: timeout"
[1] 2020/07/06 19:34:54.349017 [ERR] STREAM: raft: failed to appendEntries to: peer="{Voter "stan-cluster-poc-3" stan-cluster-poc."stan-cluster-poc-3".stan-cluster-poc}" error="natslog: read timeout"
On stan-cluster-poc-2
I received the warning bellow:
[1] 2020/07/06 19:25:33.240858 [WRN] STREAM: raft: failed to get previous log: previous-index=4 last-index=0 error="log not found"
On stan-cluster-poc-3
I received the warning bellow:
[1] 2020/07/06 19:25:34.340570 [WRN] STREAM: raft: failed to get previous log: previous-index=5 last-index=0 error="log not found"
Context
OSX version 10.14.6 docker 19.03.8 kubernetes version 1.16.5 persistent volume with storageClassName “local-storage” and ReadWriteOnce mode nats operator 0.7.2 nats-server version 2.1.7 nats streaming operator 0.3.0-v1alpha1 nats-streaming-server version 0.18.0