Helm: STAN replicas fail to start when there's 2 or more
Hi,
There appears to be an issue when using two replicas,
I don't quite understand it...I am able to recreate it when using the NATS Operator.
For example, when I install STAN like this (where I've cloned this repo):
helm upgrade --install stan $HOME/k8s/helm/charts/stan \
--set stan.replicas=2 \
--set store.type=file,store.file.storageSize=1Gi,store.volume.storageClass=rook-ceph-block \
--set stan.nats.url=nats.default.svc:4222 \
--set stan.logging.debug=true \
--set stan.nats.serviceRoleAuth.enabled=true,stan.nats.serviceRoleAuth.natsClusterName=nats
I get this error:
[1] 2020/07/18 00:32:43.866045 [INF] STREAM: Starting nats-streaming-server[stan] version 0.17.0
[1] 2020/07/18 00:32:43.866139 [INF] STREAM: ServerID: J6zdHu1BZispFbuanU03re
[1] 2020/07/18 00:32:43.866142 [INF] STREAM: Go version: go1.13.7
[1] 2020/07/18 00:32:43.866145 [INF] STREAM: Git commit: [f4b7190]
[1] 2020/07/18 00:32:43.884804 [INF] STREAM: Recovering the state...
[1] 2020/07/18 00:32:43.884923 [INF] STREAM: No recovered state
[1] 2020/07/18 00:32:43.903360 [INF] STREAM: Shutting down.
[1] 2020/07/18 00:32:43.903518 [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "stan"
I would assume when using replicas (instead of clusters), that the streaming.id must match, but nodes cannot share the same streaming.cluster.node_id?
If you can point me in the right direction, I might be able to help.
The problem is that the cluster is not being formed unless either cluster or fault tolerance is enabled. For example with clustering:
store:
cluster:
enabled: true
In case of a readwritemany filesystem (I think rook + ceph could work this way), you could use fault tolerance instead.
I see, so I shouldn't specify replicas by themselves, I would need to either enable cluster or fault tolerant mode?
I see, so I shouldn't specify replicas by themselves, I would need to either enable cluster or fault tolerant mode?
I see now, I just reread the docs and its more clear now, thanks for the info!
Is this something where you think I should update the chart to fail a helm install if the user specifies replicas > 1 where store.cluster.enabled and store.ft.group are false?
Is this something where you think I should update the chart to fail a helm install if the user specifies replicas > 1 where store.cluster.enabled and store.ft.group are false?
Yes a check like that would help avoid this error, multiple replicas only make sense when either cluster or ft mode are enabled.
Closing due to age of issue; if experiencing in current versions please open a new issue.