couchdb-helm icon indicating copy to clipboard operation
couchdb-helm copied to clipboard

Fauxton shows “This database failed to load” after pod restarts

Open DB185344 opened this issue 4 years ago • 4 comments

Describe the bug

After restarting a pod, the node fails to join the cluster properly, and we're getting an error on Fauxton, that displays 'this database failed to load' on some databases. when refreshing the browser, a different db comes online and a different db displays 'this database failed to load'. only after running a curl request with 'finish_cluster' the error stops.

Version of Helm and Kubernetes: Helm: 3.5.4, Kubernetes: 1.19

What happened: After restarting a pod, the node fails to join the cluster properly, and only after running:

curl -X POST http://$adminUser:$adminPassword@<couchdb_pod>:5984/_cluster_setup -H "Accept: application/json" -H "Content-Type: application/json" -d '{"action": "finish_cluster"}' The pod will join back to the cluster.

What you expected to happen: After restart of the pod, the node automatically joins the cluster.

How to reproduce it (as minimally and precisely as possible): restart 1 pod in the cluster.

Anything else we need to know:

Adding image from Fauxton regarding this database failed to load:

image

Also added the values.yaml:

clusterSize: 3

allowAdminParty: false

createAdminSecret: false

adminUsername: admin networkPolicy: enabled: true

serviceAccount: enabled: true create: true persistentVolume: enabled: true accessModes: - ReadWriteOnce size: 10Gi storageClass: "ssd-couchdb"

image: repository: tag: latest pullPolicy: Always

searchImage: repository: kocolosk/couchdb-search tag: 0.2.0 pullPolicy: IfNotPresent

enableSearch: false

initImage: repository: busybox tag: latest pullPolicy: Always

podManagementPolicy: Parallel

affinity: {}

annotations: {}

tolerations: []

service:

annotations:

enabled: true type: LoadBalancer externalPort: 5984 sidecarsPort: 8080 LoadBalancerIP:

ingress: enabled: false hosts: - chart-example.local path: / annotations: [] tls: resources: {}

erlangFlags: name: couchdb setcookie: monster

couchdbConfig: chttpd: bind_address: any require_valid_user: false

dns: clusterDomainSuffix: cluster.local livenessProbe: enabled: true failureThreshold: 3 initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 readinessProbe: enabled: true failureThreshold: 3 initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1

sidecars: image: "<sidecar_image>" imagePullPolicy: Always

DB185344 avatar Jul 15 '21 11:07 DB185344

Did you ever find a fix for the pod not rejoining the cluster properly? I'm encountering that now.

jftanner avatar Sep 26 '22 21:09 jftanner

@jftanner can you share the logs from the pod that isn't joined? If the admin hash is not specified in the helm chart then you may be encountering https://github.com/apache/couchdb-helm/issues/7.

willholley avatar Sep 28 '22 08:09 willholley

Hi @willholley. It might be #7, but it doesn't happen on pod restart. It only happens when there's a new pod after a helmd upgrade. It seems to be that whenever the helm chart is run, it generates new credentials. (I noticed that the auto-generated admin password changes every time I install I update the helm deployment.) New pods pick up the new credential, but old ones don't. So the workaround I found was to kill all the existing pods after scaling. (Obviously not ideal, but I don't have to do that very often.)

Perhaps #89 will fix it?

Alternatively, I could just define my own admin credentials manually and not have a problem anymore.

jftanner avatar Sep 28 '22 14:09 jftanner

Yes, this sounds just like #78 , and #89 would likely fix / is intended to fix 😄

colearendt avatar Sep 29 '22 20:09 colearendt