k8s icon indicating copy to clipboard operation
k8s copied to clipboard

Helm: STAN replicas fail to start when there's 2 or more

Open kylos101 opened this issue 5 years ago • 4 comments

Hi,

There appears to be an issue when using two replicas,

I don't quite understand it...I am able to recreate it when using the NATS Operator.

For example, when I install STAN like this (where I've cloned this repo):

helm upgrade --install stan $HOME/k8s/helm/charts/stan \
--set stan.replicas=2 \
--set store.type=file,store.file.storageSize=1Gi,store.volume.storageClass=rook-ceph-block \
--set stan.nats.url=nats.default.svc:4222 \
--set stan.logging.debug=true \
--set stan.nats.serviceRoleAuth.enabled=true,stan.nats.serviceRoleAuth.natsClusterName=nats

I get this error:

[1] 2020/07/18 00:32:43.866045 [INF] STREAM: Starting nats-streaming-server[stan] version 0.17.0
[1] 2020/07/18 00:32:43.866139 [INF] STREAM: ServerID: J6zdHu1BZispFbuanU03re
[1] 2020/07/18 00:32:43.866142 [INF] STREAM: Go version: go1.13.7
[1] 2020/07/18 00:32:43.866145 [INF] STREAM: Git commit: [f4b7190]
[1] 2020/07/18 00:32:43.884804 [INF] STREAM: Recovering the state...
[1] 2020/07/18 00:32:43.884923 [INF] STREAM: No recovered state
[1] 2020/07/18 00:32:43.903360 [INF] STREAM: Shutting down.
[1] 2020/07/18 00:32:43.903518 [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "stan"

I would assume when using replicas (instead of clusters), that the streaming.id must match, but nodes cannot share the same streaming.cluster.node_id?

If you can point me in the right direction, I might be able to help.

kylos101 avatar Jul 18 '20 00:07 kylos101

The problem is that the cluster is not being formed unless either cluster or fault tolerance is enabled. For example with clustering:

store:
  cluster:
    enabled: true

In case of a readwritemany filesystem (I think rook + ceph could work this way), you could use fault tolerance instead.

wallyqs avatar Jul 18 '20 01:07 wallyqs

I see, so I shouldn't specify replicas by themselves, I would need to either enable cluster or fault tolerant mode?

kylos101 avatar Jul 18 '20 01:07 kylos101

I see, so I shouldn't specify replicas by themselves, I would need to either enable cluster or fault tolerant mode?

I see now, I just reread the docs and its more clear now, thanks for the info!

Is this something where you think I should update the chart to fail a helm install if the user specifies replicas > 1 where store.cluster.enabled and store.ft.group are false?

kylos101 avatar Jul 18 '20 01:07 kylos101

Is this something where you think I should update the chart to fail a helm install if the user specifies replicas > 1 where store.cluster.enabled and store.ft.group are false?

Yes a check like that would help avoid this error, multiple replicas only make sense when either cluster or ft mode are enabled.

wallyqs avatar Jul 18 '20 06:07 wallyqs

Closing due to age of issue; if experiencing in current versions please open a new issue.

caleblloyd avatar May 03 '23 17:05 caleblloyd