redis-enterprise-k8s-docs
redis-enterprise-k8s-docs copied to clipboard
Readiness probe failed: node id file does not exist - pod is not yet bootstrapped
Hi, I am trying to set up a Redis cluster on gke using operator but it fails with error: Readiness probe failed: node id file does not exist - pod is not yet bootstrapped.
I have a gke cluster up and running with node-pool of 6 nodes
and machine type n1-standard-8.
Steps to reproduce:
- Created Redis operator successfully:
kubectl apply -f bundle.yaml
- When I create RedisEnterpriseCluster with command
kubectl apply -f rec.yaml
. It fails. These are the event logs found from commandkubectl describe pod redis-enterprise-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 70s (x2 over 70s) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 68s default-scheduler Successfully assigned default/redis-enterprise-0 to gke-redis-cluster-larger-pool-81081033-smfc
Normal SuccessfulAttachVolume 63s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-fb06fa95-f378-4e49-a0f0-3a41e67404be"
Normal Pulling 57s kubelet Pulling image "redislabs/redis:6.0.20-69"
Normal Pulled 39s kubelet Successfully pulled image "redislabs/redis:6.0.20-69" in 18.648849889s
Normal Created 25s kubelet Created container redis-enterprise-node
Normal Started 25s kubelet Started container redis-enterprise-node
Normal Pulling 25s kubelet Pulling image "redislabs/operator:6.0.20-4"
Normal Pulled 21s kubelet Successfully pulled image "redislabs/operator:6.0.20-4" in 4.226643732s
Normal Created 18s kubelet Created container bootstrapper
Normal Started 18s kubelet Started container bootstrapper
Warning Unhealthy 7s kubelet Readiness probe failed: node id file does not exist - pod is not yet bootstrapped
pod is not yet bootstrapped
My rec.yaml file looks like this :
apiVersion: app.redislabs.com/v1alpha1
kind: RedisEnterpriseCluster
metadata:
name: "redis-enterprise"
spec:
# Add fields here
nodes: 3
uiServiceType: LoadBalancer
redisEnterpriseNodeResources:
limits:
cpu: 250m
memory: 500Mi
requests:
cpu: 250m
memory: 500Mi
I have tried with different CPU limits but still facing the same error. Please let me know if I am doing something wrong here.
Hi,
What does kubectl get sc -o yaml
return?
It returns this:
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/component-name: pdcsi
components.gke.io/component-version: 0.9.6
components.gke.io/layer: addon
creationTimestamp: "2021-06-22T09:36:20Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-compute-persistent-disk-csi-driver
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:components.gke.io/component-name: {}
f:components.gke.io/component-version: {}
f:components.gke.io/layer: {}
f:labels:
.: {}
f:addonmanager.kubernetes.io/mode: {}
f:k8s-app: {}
f:parameters:
.: {}
f:type: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: kubectl
operation: Update
time: "2021-06-22T09:36:20Z"
name: premium-rwo
resourceVersion: "307"
selfLink: /apis/storage.k8s.io/v1/storageclasses/premium-rwo
uid: a32b3136-b3ce-4014-a283-4aa2ce550375
parameters:
type: pd-ssd
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2021-06-22T09:36:20Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:storageclass.kubernetes.io/is-default-class: {}
f:labels:
.: {}
f:addonmanager.kubernetes.io/mode: {}
f:parameters:
.: {}
f:type: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: kubectl
operation: Update
time: "2021-06-22T09:36:20Z"
name: standard
resourceVersion: "314"
selfLink: /apis/storage.k8s.io/v1/storageclasses/standard
uid: 460ef44f-a9a1-4e9e-a0f6-890a5342ed97
parameters:
type: pd-standard
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: Immediate
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/layer: addon
storageclass.kubernetes.io/is-default-class: "false"
creationTimestamp: "2021-06-22T09:36:20Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-compute-persistent-disk-csi-driver
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:components.gke.io/layer: {}
f:storageclass.kubernetes.io/is-default-class: {}
f:labels:
.: {}
f:addonmanager.kubernetes.io/mode: {}
f:k8s-app: {}
f:parameters:
.: {}
f:type: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: kubectl
operation: Update
time: "2021-06-22T09:36:20Z"
name: standard-rwo
resourceVersion: "308"
selfLink: /apis/storage.k8s.io/v1/storageclasses/standard-rwo
uid: 2e15ed05-7b2e-4119-8c98-c75dd10c180c
parameters:
type: pd-balanced
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Thank you. What is the output of the 2 following commands:
kubectl describe pvc
kubectl describe pv
Output of kubectl describe pvc :
Name: redis-enterprise-storage-rec-0
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-c74db61a-5fea-4215-8074-501acae47c77
Labels: app=redis-enterprise
redis.io/cluster=rec
redis.io/role=node
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 20Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: <none>
Events: <none>
Name: redis-enterprise-storage-recl-0
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-a9a71c6a-8d43-4094-86c1-b17465b8f359
Labels: app=redis-enterprise
redis.io/cluster=recl
redis.io/role=node
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 20Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: <none>
Events: <none>
Name: redis-enterprise-storage-redcl-0
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-c652edbd-a5f0-449f-a202-00ba3a5e7b7e
Labels: app=redis-enterprise
redis.io/cluster=redcl
redis.io/role=node
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 3Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: <none>
Events: <none>
Name: redis-enterprise-storage-redis-enterprise-0
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-fb06fa95-f378-4e49-a0f0-3a41e67404be
Labels: app=redis-enterprise
redis.io/cluster=redis-enterprise
redis.io/role=node
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 3Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: redis-enterprise-0
Events: <none>
Output of kubectl describe pv :
Name: pvc-a9a71c6a-8d43-4094-86c1-b17465b8f359
Labels: failure-domain.beta.kubernetes.io/region=us-east1
failure-domain.beta.kubernetes.io/zone=us-east1-b
Annotations: kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/redis-enterprise-storage-recl-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 20Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-east1-b]
failure-domain.beta.kubernetes.io/region in [us-east1]
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: gke-redis-cluster-c493-pvc-a9a71c6a-8d43-4094-86c1-b17465b8f359
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Name: pvc-c652edbd-a5f0-449f-a202-00ba3a5e7b7e
Labels: failure-domain.beta.kubernetes.io/region=us-east1
failure-domain.beta.kubernetes.io/zone=us-east1-b
Annotations: kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/redis-enterprise-storage-redcl-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 3Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-east1-b]
failure-domain.beta.kubernetes.io/region in [us-east1]
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: gke-redis-cluster-c493-pvc-c652edbd-a5f0-449f-a202-00ba3a5e7b7e
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Name: pvc-c74db61a-5fea-4215-8074-501acae47c77
Labels: failure-domain.beta.kubernetes.io/region=us-east1
failure-domain.beta.kubernetes.io/zone=us-east1-b
Annotations: kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/redis-enterprise-storage-rec-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 20Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-east1-b]
failure-domain.beta.kubernetes.io/region in [us-east1]
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: gke-redis-cluster-c493-pvc-c74db61a-5fea-4215-8074-501acae47c77
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Name: pvc-fb06fa95-f378-4e49-a0f0-3a41e67404be
Labels: failure-domain.beta.kubernetes.io/region=us-east1
failure-domain.beta.kubernetes.io/zone=us-east1-b
Annotations: kubernetes.io/createdby: gce-pd-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pv-protection]
StorageClass: standard
Status: Bound
Claim: default/redis-enterprise-storage-redis-enterprise-0
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 3Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [us-east1-b]
failure-domain.beta.kubernetes.io/region in [us-east1]
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: gke-redis-cluster-c493-pvc-fb06fa95-f378-4e49-a0f0-3a41e67404be
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
Hi,
I would suggest running the log_collector.py script from this project in order to generate a diagnostic package and open a Support Ticket with Redis Labs to upload the package so that we can analyze it.
Laurent.
Thanks for the help. I have opened a ticket with Redis Labs for the same.
Hello @pinkeshr, I have the same issue, did you solve it?
Hi Paul,
This is a very generic message that is always displayed when the pod (node) is not bootstrapped. There can be dozens of reasons for it, so just like above, I'd suggest running the log_collector.py script from this project in order to generate a diagnostic package and open a Support Ticket with Redis to upload the package so that we can understand what is causing this on your cluster and help you with it.
Cheers,
Laurent.
ITNOA
I have the same issue
ssoroosh@master:~/ScalableProductionReadyServiceSample/Deployment/Harbor$ kubectl describe pods harbor-cluster-0
Name: harbor-cluster-0
Namespace: default
Priority: 0
Node: host2/172.21.73.126
Start Time: Thu, 03 Feb 2022 21:02:07 +0000
Labels: app=redis-enterprise
controller-revision-hash=harbor-cluster-7f55579578
redis.io/cluster=harbor-cluster
redis.io/role=node
statefulset.kubernetes.io/pod-name=harbor-cluster-0
Annotations: <none>
Status: Running
IP: 10.0.2.228
IPs:
IP: 10.0.2.228
Controlled By: StatefulSet/harbor-cluster
Containers:
redis-enterprise-node:
Container ID: docker://9e31eb53ebcebdd61123536e1c5ea6b54c73ac7ff8823bcfd7619a813ca54314
Image: redislabs/redis:6.2.8-64
Image ID: docker-pullable://redislabs/redis@sha256:9c1015546ee6b99a48d86bd8c762db457c69e3c16f2e950f468ca92629681103
Ports: 8001/TCP, 8443/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 03 Feb 2022 21:02:20 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 1
memory: 1Gi
Readiness: exec [bash -c /opt/redislabs/bin/python /opt/redislabs/mount/health_check.py] delay=0s timeout=30s period=10s #success=1 #failure=3
Environment:
CREDENTIAL_TYPE: kubernetes
Mounts:
/opt/redislabs/credentials from credentials (rw)
/opt/redislabs/mount from health-check-volume (ro)
/opt/redislabs/shared from shared-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9s98m (ro)
bootstrapper:
Container ID: docker://d2ddfdb3ddd54d6bea8a38698c9cfb8eabbb14455b75aebd55f537950456f1bf
Image: redislabs/operator:6.2.8-15
Image ID: docker-pullable://redislabs/operator@sha256:0f144922ea1e2d4ea72affb36238258c9f21c39d6ba9ad73da79278dde1eed37
Port: 8787/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/bootstrapper
State: Running
Started: Thu, 03 Feb 2022 21:25:32 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 03 Feb 2022 21:19:49 +0000
Finished: Thu, 03 Feb 2022 21:25:23 +0000
Ready: True
Restart Count: 4
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8787/livez delay=300s timeout=15s period=15s #success=1 #failure=3
Environment:
NAMESPACE: default (v1:metadata.namespace)
POD_NAME: harbor-cluster-0 (v1:metadata.name)
REC_NAME: harbor-cluster
CREDENTIAL_TYPE: kubernetes
Mounts:
/etc/opt/redislabs/mount/bulletin-board from bulletin-board-volume (rw)
/opt/redislabs/credentials from credentials (rw)
/opt/redislabs/shared from shared-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9s98m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
bulletin-board-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: harbor-cluster-bulletin-board
Optional: false
health-check-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: harbor-cluster-health-check
Optional: false
shared-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
credentials:
Type: Secret (a volume populated by a Secret)
SecretName: harbor-cluster
Optional: false
kube-api-access-9s98m:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned default/harbor-cluster-0 to host2
Normal Pulled 27m kubelet Container image "redislabs/redis:6.2.8-64" already present on machine
Normal Created 27m kubelet Created container redis-enterprise-node
Normal Started 27m kubelet Started container redis-enterprise-node
Normal Pulled 27m kubelet Container image "redislabs/operator:6.2.8-15" already present on machine
Normal Created 27m kubelet Created container bootstrapper
Normal Started 27m kubelet Started container bootstrapper
Warning Unhealthy 2m20s (x100 over 27m) kubelet Readiness probe failed: node id file does not exist - pod is not yet bootstrapped
pod is not yet bootstrapped
My cluster.yaml file like below
apiVersion: "app.redislabs.com/v1"
kind: "RedisEnterpriseCluster"
metadata:
name: "harbor-cluster"
spec:
nodes: 3
persistentSpec:
enabled: false
storageClassName: "openebs-hostpath"
# https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/
# volumeSize: 100M
redisEnterpriseNodeResources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
I run kubectl get sc
and see below results
ssoroosh@master:~/ScalableProductionReadyServiceSample/Deployment/Harbor$ kubectl get sc --all-namespaces
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-device openebs.io/local Delete WaitForFirstConsumer false 13d
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 13d
What is my problem?
@laurentdroin I think my problem from mounts
but I do not know to how to resolve it
Hi Soorosh,
I think the problem for you is the resources. 1 GB of memory is definitely not enough and the first pod will never be able to create the cluster. The absolute minimum amount of memory for a dev environment is 4 GB. I have been lucky with 3 GB.
Can you increase the memory to at least 3 GB and let me know if this helped?
Laurent.
I add some memory to our cluster, and then increase memory of environment, so my problem is resolved.
@laurentdroin thanks for helping
@laurentdroin Hi again,
After I increase my memory and resolve my previous problem, and all of things, work properly. I try to power off all nodes. after some day and turn on again my system, I see below
kubectl get pods
NAME READY STATUS RESTARTS AGE
harbor-cluster-0 1/2 Running 26 (22s ago) 150m
harbor-cluster-services-rigger-6dcc59d7d8-p6hvn 1/1 Running 4 (137m ago) 24h
redis-enterprise-operator-7f8d8548c5-bj447 2/2 Running 26 (144m ago) 6d20h
as you can see harbor-cluster-0
does not completely ready, and after I see details of this pod, we can terrible message again
ssoroosh@master:~$ kubectl describe pod harbor-cluster-0
Name: harbor-cluster-0
Namespace: default
Priority: 0
Node: host2/172.19.34.29
Start Time: Thu, 10 Feb 2022 17:54:18 +0000
Labels: app=redis-enterprise
controller-revision-hash=harbor-cluster-6f5bc897db
redis.io/cluster=harbor-cluster
redis.io/role=node
statefulset.kubernetes.io/pod-name=harbor-cluster-0
Annotations: <none>
Status: Running
IP: 10.0.2.236
IPs:
IP: 10.0.2.236
Controlled By: StatefulSet/harbor-cluster
Containers:
redis-enterprise-node:
Container ID: docker://5cd18ba3cce456f8af2a834c348a22a5dc7cd9cb2103898963529399464fda8f
Image: redislabs/redis:6.2.8-64
Image ID: docker-pullable://redislabs/redis@sha256:9c1015546ee6b99a48d86bd8c762db457c69e3c16f2e950f468ca92629681103
Ports: 8001/TCP, 8443/TCP, 9443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 10 Feb 2022 18:06:46 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Thu, 10 Feb 2022 17:54:25 +0000
Finished: Thu, 10 Feb 2022 18:00:03 +0000
Ready: False
Restart Count: 1
Limits:
cpu: 1
memory: 4Gi
Requests:
cpu: 1
memory: 4Gi
Readiness: exec [bash -c /opt/redislabs/bin/python /opt/redislabs/mount/health_check.py] delay=0s timeout=30s period=10s #success=1 #failure=3
Environment:
CREDENTIAL_TYPE: kubernetes
Mounts:
/opt/redislabs/credentials from credentials (rw)
/opt/redislabs/mount from health-check-volume (ro)
/opt/redislabs/shared from shared-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-twt5s (ro)
bootstrapper:
Container ID: docker://360f7afe3f8a04c2616ae3fa976a9e5f970d98f1f77c0eed2c838ae7d95acce3
Image: redislabs/operator:6.2.8-15
Image ID: docker-pullable://redislabs/operator@sha256:0f144922ea1e2d4ea72affb36238258c9f21c39d6ba9ad73da79278dde1eed37
Port: 8787/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/bootstrapper
State: Running
Started: Thu, 10 Feb 2022 20:30:38 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 10 Feb 2022 20:24:53 +0000
Finished: Thu, 10 Feb 2022 20:30:35 +0000
Ready: True
Restart Count: 26
Limits:
cpu: 100m
memory: 128Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8787/livez delay=300s timeout=15s period=15s #success=1 #failure=3
Environment:
NAMESPACE: default (v1:metadata.namespace)
POD_NAME: harbor-cluster-0 (v1:metadata.name)
REC_NAME: harbor-cluster
CREDENTIAL_TYPE: kubernetes
Mounts:
/etc/opt/redislabs/mount/bulletin-board from bulletin-board-volume (rw)
/opt/redislabs/credentials from credentials (rw)
/opt/redislabs/shared from shared-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-twt5s (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
bulletin-board-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: harbor-cluster-bulletin-board
Optional: false
health-check-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: harbor-cluster-health-check
Optional: false
shared-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
credentials:
Type: Secret (a volume populated by a Secret)
SecretName: harbor-cluster
Optional: false
kube-api-access-twt5s:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 46s (x1047 over 145m) kubelet Readiness probe failed: node id file does not exist - pod is not yet bootstrapped
pod is not yet bootstrapped
and bootstrapper log like below
ssoroosh@master:~$ kubectl logs harbor-cluster-0 -c bootstrapper
time="2022-02-10T20:30:40Z" level=info msg="REC name: harbor-cluster"
time="2022-02-10T20:30:40Z" level=info msg="Cluster Name: harbor-cluster.default.svc.cluster.local"
time="2022-02-10T20:30:40Z" level=info msg="No rack ID specified"
time="2022-02-10T20:30:45Z" level=info msg="getting bootstrap information from Redis Enterprise API"
time="2022-02-10T20:30:45Z" level=info msg="Redis Enterprise API is accessible, and ready for bootstrap"
time="2022-02-10T20:30:45Z" level=info msg="All pods perform join_cluster"
As you can see my pods have sufficient memory, How to resolve my problem?
related to #214
Hi @soroshsabz,
Yes, as I explained in the other issue you opened (I didn't see you opened this new one), this is expected. After a Redis Enterprise cluster is created, quorum must be maintained at all times. We define quorum as the majority of nodes. In a 3 nodes cluster, you must always have 2 nodes up and ready at any given time. Your cluster will not survive having 2 or all the nodes down at the same time.
If your cluster has lost quorum and is therefore no longer working, you would need to recover it using this procedure: https://docs.redis.com/latest/kubernetes/re-clusters/cluster-recovery/
I hope this helps.
Laurent.
After remove cluster, I can create new and healthy cluster
Thanks to @laurentdroin
Hello, got the same problem, @laurentdroin i understand that we need to maintain the cluster at all times to get that working fine. But since we are using an Enterprise Operator (and we pay for it) can you make the magic happens and automatically resolve this problem in case of emergency reboot of our K8S nodes. For example, i got in that state this morning for a 5 nodes redis cluster just after a GKE 1.20 to 1.21 migration. this is not acceptable in production. A cluster should not broke that ways so easily...
If we are moving from standalone redis or redis/sentinel or even google memorystore to redis entreprise it's for have more: high availability (but here... we got lesser) better scalability (it's OK for this point)
Actually we cannot move forward using that product in this state, we do not ask the moon... just something that's works
@cschockaert thank you for the feedback! Question - have you opened a support case for the issue? Engineers from Redis will be happy to get you unblocked. Thx
Hello, i'm in touch with redis team (@fcerbelle), (not directly the support). Actually, we are using the preemptible (https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms) feature of GKE (node can be killed anytime after 24H of life ; but cost us 10x less) I don't know if the quorum was lost because of GKE upgrade or preemptible. For now, the only solution to mitigate this cluster failure would be to not use preemptible and / or to add more node to the cluster so the quorum would be harder to get in a state to be lost.