charts icon indicating copy to clipboard operation
charts copied to clipboard

[bitnami/etcd] Etcd upgrade issue

Open abhayycs opened this issue 10 months ago • 4 comments

Name and Version

bitnami/etcd

What architecture are you using?

amd64

What steps will reproduce the bug?

I have a 3 node K8s Cluster, where I'm installing Etcd-cluster with persistence enabled. It is found that sometimes, when upgrading the cluster with new etcd-changes, one of the instances goes to CrashLoopBackOff state.

NAME       READY                        STATUS    RESTARTS      AGE
voltha  voltha-etcd-cluster-client-0  1/1 Running  0 14h
voltha  voltha-etcd-cluster-client-1  1/1 Running  0 14h
voltha  voltha-etcd-cluster-client-2  0/1 CrashLoopBackOff 173 (4m31s ago) 14h

Are you using any custom parameters or values?

persistence is enabled

What is the expected behavior?

No response

What do you see instead?

Since each etcd instance is associated with a PersitenceVolumeClaim & a PersistenceVolume. So, to recover from this state I have to delete the PV associated with the voltha-etcd-cluster-client-2 instance and restart the voltha-etcd-cluster-client-2 pod. My question is: Is it okay if I delete the PeristenceVolume with surety that no data is lost and data is up-to-date. I don't want to end-up in a situation where I lose any data or if the data is not up-to-date.

Any help would be greatly appreciated.

Additional information

No response

abhayycs avatar Apr 19 '24 08:04 abhayycs

Hi!

Which is the error that appears when running into that CrashLoopBackoff state? Do the logs show anything meaningful?

javsalgar avatar Apr 19 '24 08:04 javsalgar

Hi Javier,

Please find the logs:

$ kubectl logs -f voltha-etcd-cluster-client-2 -n voltha
etcd 05:28:31.88
etcd 05:28:31.88 Welcome to the Bitnami etcd container
etcd 05:28:31.89 Subscribe to project updates by watching
https://github.com/bitnami/containers
etcd 05:28:31.89 Submit issues and feature requests at
https://github.com/bitnami/containers/issues
etcd 05:28:31.89
etcd 05:28:31.89 INFO  ==> ** Starting etcd setup **
etcd 05:28:31.91 INFO  ==> Validating settings in ETCD_* env vars..
etcd 05:28:31.92 WARN  ==> You set the environment variable
ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in
a production environment.
etcd 05:28:31.92 INFO  ==> Initializing etcd
etcd 05:28:31.93 INFO  ==> Generating etcd config file using env variables
etcd 05:28:31.94 INFO  ==> Detected data from previous deployments
etcd 05:28:32.08 INFO  ==> Updating member in existing cluster
***@***.***/retry_interceptor.go:62","msg":"retrying
of unary invoker
failed","target":"etcd-endpoints://0xc0001f0000/voltha-etcd-cluster-client-0.voltha-etcd-cluster-client-headless.voltha.svc.cluster.local:2379","attempt":0,"error":"rpc
error: code = NotFound desc = etcdserver: member not found"}

Error: etcdserver: member not found

Thanks,

Abhay

abhayycs avatar Apr 19 '24 12:04 abhayycs

Hi @abhayycs

There were a couple of similar previous issues to your scenario #6251 and #10009 (although they apply for scaling too). Could you please take a look to check if your situation is the same and if the suggestions for those cases may help?

dgomezleon avatar Apr 23 '24 15:04 dgomezleon

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar May 09 '24 01:05 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions[bot] avatar May 14 '24 01:05 github-actions[bot]