kubeblocks
kubeblocks copied to clipboard
[BUG] KB1.0 cluster vscale failed
Describe the bug A clear and concise description of what the bug is.
kbcli version
Kubernetes: v1.29.6-gke.1326000
KubeBlocks: 1.0.0-alpha.5
kbcli: 1.0.0-alpha.0
To Reproduce Steps to reproduce the behavior:
- create etcd cluster cluster yaml
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: etcd-ptyfua
namespace: default
spec:
terminationPolicy: WipeOut
componentSpecs:
- name: etcd
componentDef: etcd
replicas: 3
resources:
requests:
cpu: 100m
memory: 0.5Gi
limits:
cpu: 100m
memory: 0.5Gi
volumeClaimTemplates:
- name: data
spec:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- cluster status
kbcli cluster list
]NAME NAMESPACE CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS CREATED-TIME
etcd-ptyfua default WipeOut Running Aug 30,2024 17:15 UTC+0800
3.vscale cluster
kbcli cluster vscale etcd-ptyfua --auto-approve --force=true --components etcd --cpu 0.2 --memory 0.6
- see error
(base) kb@192 testinfra % k get pod
NAME READY STATUS RESTARTS AGE
etcd-ptyfua-etcd-0 2/2 Running 0 10m
etcd-ptyfua-etcd-1 2/2 Running 0 10m
etcd-ptyfua-etcd-2 1/2 CrashLoopBackOff 5 (2m41s ago) 6m4s
(base) kb@192 testinfra % k get cluster
NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE
etcd-ptyfua WipeOut Failed 10m
(base) kb@192 testinfra % k get ops
NAME TYPE CLUSTER STATUS PROGRESS AGE
etcd-ptyfua-verticalscaling-47d2f VerticalScaling etcd-ptyfua Failed 1/3 6m46s
- describe pod
k describe pod etcd-ptyfua-etcd-2
Name: etcd-ptyfua-etcd-2
Namespace: default
Priority: 0
Service Account: kb-etcd-ptyfua
Node: gke-dhtest-gke-dhtest-gke-05a50c4d-dzqd/10.128.0.36
Start Time: Fri, 30 Aug 2024 17:19:46 +0800
Labels: app.kubernetes.io/component=etcd
app.kubernetes.io/instance=etcd-ptyfua
app.kubernetes.io/managed-by=kubeblocks
app.kubernetes.io/name=etcd
app.kubernetes.io/version=etcd
apps.kubeblocks.io/cluster-uid=1a63a11f-eb94-42f8-a192-e0710d3243ee
apps.kubeblocks.io/component-name=etcd
apps.kubeblocks.io/pod-name=etcd-ptyfua-etcd-2
componentdefinition.kubeblocks.io/name=etcd
controller-revision-hash=58bc8954c9
workloads.kubeblocks.io/instance=etcd-ptyfua-etcd
workloads.kubeblocks.io/managed-by=InstanceSet
Annotations: apps.kubeblocks.io/component-replicas: 3
Status: Running
IP: 10.0.6.99
IPs:
IP: 10.0.6.99
Controlled By: InstanceSet/etcd-ptyfua-etcd
Init Containers:
inject-shell:
Container ID: containerd://b15770ef403e72456d973b81e6800b045f6a12f8d8e31c7d3ad754b612a35ca8
Image: docker.io/busybox:1.35-musl
Image ID: docker.io/library/busybox@sha256:eaa51c8ca08bd769af7acc4e9748c01db3d0b8da22f35e55ce9199f980e8deda
Port: <none>
Host Port: <none>
Command:
bin/sh
-c
#!/bin/sh
# inject shell if needed
busyboxAction() {
# copy sh to /shell in order to adapt distroless entrypoint
cp /bin/sh /shell
}
distrolessAction() {
echo "etcd image build with distroless, injecting brinaries in order to run scripts"
cp /bin/* /shell
}
# versionCheck only check image type but not availability
checkVersionAndInject() {
local version=$1
echo "$version" | grep -Eq '^v[0-9]+\.[0-9]+\.[0-9]+$'
if [ $? -ne 0 ]; then
echo "Invalid version format, check vars ETCD_VERSION"
exit 1
fi
versionParse=$(echo "$version" | sed 's/^v//')
major=$(echo "$versionParse" | cut -d. -f1)
minor=$(echo "$versionParse" | cut -d. -f2)
patch=$(echo "$versionParse" | cut -d. -f3)
# <=3.3 || <= 3.4.22 || <=3.5.6 all use busybox https://github.com/etcd-io/etcd/tree/main/CHANGELOG
if [ $major -lt 3 ] || ([ $major -eq 3 ] && [ $minor -le 3 ]); then
busyboxAction
elif [ $major -eq 3 ] && [ $minor -eq 4 ] && [ $patch -le 22 ]; then
busyboxAction
elif [ $major -eq 3 ] && [ $minor -eq 5 ] && [ $patch -le 6 ]; then
busyboxAction
else
distrolessAction
fi
}
checkVersionAndInject $ETCD_VERSION
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 30 Aug 2024 17:19:50 +0800
Finished: Fri, 30 Aug 2024 17:19:51 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 0
memory: 0
Requests:
cpu: 0
memory: 0
Environment Variables from:
etcd-ptyfua-etcd-env ConfigMap Optional: false
Environment:
KB_POD_NAME: etcd-ptyfua-etcd-2 (v1:metadata.name)
KB_POD_UID: (v1:metadata.uid)
KB_NAMESPACE: default (v1:metadata.namespace)
KB_SA_NAME: (v1:spec.serviceAccountName)
KB_NODENAME: (v1:spec.nodeName)
KB_HOST_IP: (v1:status.hostIP)
KB_POD_IP: (v1:status.podIP)
KB_POD_IPS: (v1:status.podIPs)
KB_HOSTIP: (v1:status.hostIP)
KB_PODIP: (v1:status.podIP)
KB_PODIPS: (v1:status.podIPs)
KB_POD_FQDN: $(KB_POD_NAME).etcd-ptyfua-etcd-headless.$(KB_NAMESPACE).svc
Mounts:
/shell from shell (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bmht2 (ro)
init-kbagent:
Container ID: containerd://da6d171d4afedc7062f255be07a0ab989162e3cf7e40728fe456aee3c1bf1700
Image: docker.io/apecloud/kubeblocks-tools:1.0.0-alpha.5
Image ID: docker.io/apecloud/kubeblocks-tools@sha256:998b35a1fad892199d739d7d7bf52009089ef690897d60979b79e078ebacaecc
Port: <none>
Host Port: <none>
Command:
cp
-r
/bin/kbagent
/bin/curl
/kubeblocks/
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 30 Aug 2024 17:19:54 +0800
Finished: Fri, 30 Aug 2024 17:19:54 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 0
memory: 0
Requests:
cpu: 0
memory: 0
Environment Variables from:
etcd-ptyfua-etcd-env ConfigMap Optional: false
Environment:
KB_POD_NAME: etcd-ptyfua-etcd-2 (v1:metadata.name)
KB_POD_UID: (v1:metadata.uid)
KB_NAMESPACE: default (v1:metadata.namespace)
KB_SA_NAME: (v1:spec.serviceAccountName)
KB_NODENAME: (v1:spec.nodeName)
KB_HOST_IP: (v1:status.hostIP)
KB_POD_IP: (v1:status.podIP)
KB_POD_IPS: (v1:status.podIPs)
KB_HOSTIP: (v1:status.hostIP)
KB_PODIP: (v1:status.podIP)
KB_PODIPS: (v1:status.podIPs)
KB_POD_FQDN: $(KB_POD_NAME).etcd-ptyfua-etcd-headless.$(KB_NAMESPACE).svc
Mounts:
/kubeblocks from kubeblocks (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bmht2 (ro)
Containers:
etcd:
Container ID: containerd://79dbf77ab0970656dca6a39d3462dbc0550e20137e61971d278283c6c38f22cf
Image: docker.io/apecloud/etcd:v3.5.15
Image ID: docker.io/apecloud/etcd@sha256:0934690612905554eb61ddefb9faaaecb47c2f6931dbb453e694358092ee8990
Ports: 2379/TCP, 2380/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/shell/sh
-c
export PATH=$PATH:/shell
# for convenient to use the same entrypoint
if [ ! -e /bin/sh ]; then
cp /shell/sh /bin
fi
/scripts/start.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: StartError
Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod053fc1c3_d474_460e_9f3c_88f7a763d2b9.slice/cri-containerd-79dbf77ab0970656dca6a39d3462dbc0550e20137e61971d278283c6c38f22cf.scope/cgroup.controllers: no such file or directory: unknown
Exit Code: 128
Started: Thu, 01 Jan 1970 08:00:00 +0800
Finished: Fri, 30 Aug 2024 17:21:38 +0800
Ready: False
Restart Count: 4
Limits:
cpu: 200m
memory: 600m
Requests:
cpu: 200m
memory: 600m
Environment Variables from:
etcd-ptyfua-etcd-env ConfigMap Optional: false
etcd-ptyfua-etcd-rsm-env ConfigMap Optional: false
Environment:
KB_POD_NAME: etcd-ptyfua-etcd-2 (v1:metadata.name)
KB_POD_UID: (v1:metadata.uid)
KB_NAMESPACE: default (v1:metadata.namespace)
KB_SA_NAME: (v1:spec.serviceAccountName)
KB_NODENAME: (v1:spec.nodeName)
KB_HOST_IP: (v1:status.hostIP)
KB_POD_IP: (v1:status.podIP)
KB_POD_IPS: (v1:status.podIPs)
KB_HOSTIP: (v1:status.hostIP)
KB_PODIP: (v1:status.podIP)
KB_PODIPS: (v1:status.podIPs)
KB_POD_FQDN: $(KB_POD_NAME).etcd-ptyfua-etcd-headless.$(KB_NAMESPACE).svc
Mounts:
/etc/etcd from config (rw)
/scripts from scripts (rw)
/shell from shell (rw)
/var/run/etcd from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bmht2 (ro)
kbagent:
Container ID: containerd://d18767b34fde6bdee6fb0ef334ead00603c3aea8b78e39c2ced53e9c09d798ad
Image: docker.io/apecloud/etcd:v3.5.6
Image ID: docker.io/apecloud/etcd@sha256:28cb0630cb8536504f9bd547c3e63e608242c40dbffb1464c892d8d59fd3da44
Port: 3501/TCP
Host Port: 0/TCP
Command:
/kubeblocks/kbagent
Args:
--port
3501
State: Running
Started: Fri, 30 Aug 2024 17:19:56 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 0
memory: 0
Requests:
cpu: 0
memory: 0
Startup: tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
etcd-ptyfua-etcd-env ConfigMap Optional: false
etcd-ptyfua-etcd-rsm-env ConfigMap Optional: false
Environment:
KB_POD_NAME: etcd-ptyfua-etcd-2 (v1:metadata.name)
KB_POD_UID: (v1:metadata.uid)
KB_NAMESPACE: default (v1:metadata.namespace)
KB_SA_NAME: (v1:spec.serviceAccountName)
KB_NODENAME: (v1:spec.nodeName)
KB_HOST_IP: (v1:status.hostIP)
KB_POD_IP: (v1:status.podIP)
KB_POD_IPS: (v1:status.podIPs)
KB_HOSTIP: (v1:status.hostIP)
KB_PODIP: (v1:status.podIP)
KB_PODIPS: (v1:status.podIPs)
KB_POD_FQDN: $(KB_POD_NAME).etcd-ptyfua-etcd-headless.$(KB_NAMESPACE).svc
CLUSTER_DOMAIN: .cluster.local
KB_AGENT_ACTION: [{"name":"switchover","exec":{"command":["/bin/sh","-c","set -ex\n #!/bin/sh\n \n # config file used to bootstrap the etcd cluster\n configFile=$TMP_CONFIG_PATH\n \n checkBackupFile() {\n local backupFile=$1\n output=$(etcdutl snapshot status ${backupFile})\n # check if the command was successful\n if [ $? -ne 0 ]; then\n echo \"ERROR: Failed to check the backup file with etcdutl\"\n exit 1\n fi\n # extract the total key from the output\n totalKey=$(echo $output | awk -F', ' '{print $3}')\n # check if total key is a number\n case $totalKey in\n *[!0-9]*)\n echo \"ERROR: snapshot totalKey is not a valid number.\"\n exit 1\n ;;\n esac\n \n # define a threshold to check if the total key count is too low\n # consider increasing this value when dealing with production-grade etcd cluster\n threshold=$BACKUP_KEY_THRESHOLD #[modifiable]\n if [ \"$totalKey\" -lt $threshold ]; then\n echo \"WARNING: snapshot totalKey is less than the threshold\"\n exit 1\n fi\n }\n \n getClientProtocol() {\n # check client tls if is enabled\n line=$(grep 'advertise-client-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n }\n \n getPeerProtocol() {\n # check peer tls if is enabled\n line=$(grep 'initial-advertise-peer-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n }\n \n execEtcdctl() {\n local endpoints=$1\n shift\n clientProtocol=$(getClientProtocol)\n tlsDir=$TLS_MOUNT_PATH\n # check if the clientProtocol is https and the tlsDir is not empty\n if [ $clientProtocol = \"https\" ] \u0026\u0026 [ -d \"$tlsDir\" ] \u0026\u0026 [ -s \"${tlsDir}/ca.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.key\" ]; then\n etcdctl --endpoints=${endpoints} --cacert=${tlsDir}/ca.crt --cert=${tlsDir}/tls.crt --key=${tlsDir}/tls.key \"$@\"\n elif [ $clientProtocol = \"http\" ]; then\n etcdctl --endpoints=${endpoints} \"$@\"\n else\n echo \"ERROR: bad etcdctl args: clientProtocol:${clientProtocol}, endpoints:${endpoints}, tlsDir:${tlsDir}, please check!\"\n exit 1\n fi\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n }\n \n # this function will be deprecated in the future\n execEtcdctlNoCheckTLS() {\n local endpoints=$1\n shift\n etcdctl --endpoints=${endpoints} \"$@\"\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n }\n \n updateLeaderIfNeeded() {\n local retries=$1\n \n if [ $retries -le 0 ]; then\n echo \"Maximum number of retries reached, leader is not ready\"\n exit 1\n fi\n \n status=$(execEtcdctlNoCheckTLS ${leaderEndpoint} endpoint status)\n isLeader=$(echo $status | awk -F ', ' '{print $5}')\n if [ \"$isLeader\" = \"false\" ]; then\n echo \"leader out of status, try to redirect to new leader\"\n peerEndpoints=$(execEtcdctlNoCheckTLS \"$leaderEndpoint\" member list | awk -F', ' '{print $5}' | tr '\\n' ',' | sed 's#,$##')\n leaderEndpoint=$(execEtcdctlNoCheckTLS \"$peerEndpoints\" endpoint status | awk -F', ' '$5==\"true\" {print $1}')\n if [ $leaderEndpoint = \"\" ]; then\n echo \"leader is not ready, wait for 2s...\"\n sleep 2\n updateLeaderIfNeeded $(expr $retries - 1)\n fi\n fi\n }\n #!/bin/sh\n \n switchoverWithCandidate() {\n leaderEndpoint=${LEADER_POD_FQDN}:2379\n candidateEndpoint=${KB_SWITCHOVER_CANDIDATE_FQDN}:2379\n \n # see common.sh, this function may change leaderEndpoint\n updateLeaderIfNeeded 3\n \n if [ \"$leaderEndpoint\" = \"$candidateEndpoint\" ]; then\n echo \"leader is the same as candidate, no need to switch\"\n exit 0\n fi\n \n candidateID=$(execEtcdctlNoCheckTLS ${candidateEndpoint} endpoint status | awk -F', ' '{print $2}')\n execEtcdctlNoCheckTLS ${leaderEndpoint} move-leader $candidateID\n \n status=$(execEtcdctlNoCheckTLS ${candidateEndpoint} endpoint status)\n isLeader=$(echo ${status} | awk -F ', ' '{print $5}')\n \n if [ \"$isLeader\" = \"true\" ]; then\n echo \"switchover successfully\"\n else\n echo \"switchover failed, please check!\"\n exit 1\n fi\n }\n \n switchoverWithoutCandidate() {\n leaderEndpoint=${LEADER_POD_FQDN}:2379\n oldLeaderEndpoint=$leaderEndpoint\n \n # see common.sh, this function may change leaderEndpoint\n updateLeaderIfNeeded 3\n \n if [ \"$oldLeaderEndpoint\" != \"$leaderEndpoint\" ]; then\n echo \"leader already changed, no need to switch\"\n exit 0\n fi\n \n leaderID=$(execEtcdctlNoCheckTLS ${leaderEndpoint} endpoint status | awk -F', ' '{print $2}')\n peerIDs=$(execEtcdctlNoCheckTLS ${leaderEndpoint} member list | awk -F', ' '{print $1}')\n randomCandidateID=$(echo \"$peerIDs\" | grep -v \"$leaderID\" | awk 'NR==1')\n \n if [ -z \"$randomCandidateID\" ]; then\n echo \"no candidate found\"\n exit 1\n fi\n \n execEtcdctlNoCheckTLS $leaderEndpoint move-leader $randomCandidateID\n \n status=$(execEtcdctlNoCheckTLS $leaderEndpoint endpoint status)\n isLeader=$(echo $status | awk -F ', ' '{print $5}')\n \n if [ \"$isLeader\" = \"false\" ]; then\n echo \"switchover successfully\"\n else\n echo \"switchover failed, please check!\"\n exit 1\n fi\n }\n \n\nif [ -z \"$KB_SWITCHOVER_CANDIDATE_FQDN\" ]; then\n switchoverWithoutCandidate\nelse\n switchoverWithCandidate\nfi\n"]}},{"name":"memberJoin","exec":{"command":["/bin/sh","-c","#!/bin/sh\n\n# config file used to bootstrap the etcd cluster\nconfigFile=$TMP_CONFIG_PATH\n\ncheckBackupFile() {\n local backupFile=$1\n output=$(etcdutl snapshot status ${backupFile})\n # check if the command was successful\n if [ $? -ne 0 ]; then\n echo \"ERROR: Failed to check the backup file with etcdutl\"\n exit 1\n fi\n # extract the total key from the output\n totalKey=$(echo $output | awk -F', ' '{print $3}')\n # check if total key is a number\n case $totalKey in\n *[!0-9]*)\n echo \"ERROR: snapshot totalKey is not a valid number.\"\n exit 1\n ;;\n esac\n\n # define a threshold to check if the total key count is too low\n # consider increasing this value when dealing with production-grade etcd cluster\n threshold=$BACKUP_KEY_THRESHOLD #[modifiable]\n if [ \"$totalKey\" -lt $threshold ]; then\n echo \"WARNING: snapshot totalKey is less than the threshold\"\n exit 1\n fi\n}\n\ngetClientProtocol() {\n # check client tls if is enabled\n line=$(grep 'advertise-client-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\ngetPeerProtocol() {\n # check peer tls if is enabled\n line=$(grep 'initial-advertise-peer-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\nexecEtcdctl() {\n local endpoints=$1\n shift\n clientProtocol=$(getClientProtocol)\n tlsDir=$TLS_MOUNT_PATH\n # check if the clientProtocol is https and the tlsDir is not empty\n if [ $clientProtocol = \"https\" ] \u0026\u0026 [ -d \"$tlsDir\" ] \u0026\u0026 [ -s \"${tlsDir}/ca.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.key\" ]; then\n etcdctl --endpoints=${endpoints} --cacert=${tlsDir}/ca.crt --cert=${tlsDir}/tls.crt --key=${tlsDir}/tls.key \"$@\"\n elif [ $clientProtocol = \"http\" ]; then\n etcdctl --endpoints=${endpoints} \"$@\"\n else\n echo \"ERROR: bad etcdctl args: clientProtocol:${clientProtocol}, endpoints:${endpoints}, tlsDir:${tlsDir}, please check!\"\n exit 1\n fi\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\n# this function will be deprecated in the future\nexecEtcdctlNoCheckTLS() {\n local endpoints=$1\n shift\n etcdctl --endpoints=${endpoints} \"$@\"\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\nupdateLeaderIfNeeded() {\n local retries=$1\n\n if [ $retries -le 0 ]; then\n echo \"Maximum number of retries reached, leader is not ready\"\n exit 1\n fi\n\n status=$(execEtcdctlNoCheckTLS ${leaderEndpoint} endpoint status)\n isLeader=$(echo $status | awk -F ', ' '{print $5}')\n if [ \"$isLeader\" = \"false\" ]; then\n echo \"leader out of status, try to redirect to new leader\"\n peerEndpoints=$(execEtcdctlNoCheckTLS \"$leaderEndpoint\" member list | awk -F', ' '{print $5}' | tr '\\n' ',' | sed 's#,$##')\n leaderEndpoint=$(execEtcdctlNoCheckTLS \"$peerEndpoints\" endpoint status | awk -F', ' '$5==\"true\" {print $1}')\n if [ $leaderEndpoint = \"\" ]; then\n echo \"leader is not ready, wait for 2s...\"\n sleep 2\n updateLeaderIfNeeded $(expr $retries - 1)\n fi\n fi\n}\n#!/bin/sh\n\nset -exo pipefail\necho \"etcd member join...\"\n# TODO\n"]}},{"name":"memberLeave","exec":{"command":["/bin/sh","-c","#!/bin/sh\n\n# config file used to bootstrap the etcd cluster\nconfigFile=$TMP_CONFIG_PATH\n\ncheckBackupFile() {\n local backupFile=$1\n output=$(etcdutl snapshot status ${backupFile})\n # check if the command was successful\n if [ $? -ne 0 ]; then\n echo \"ERROR: Failed to check the backup file with etcdutl\"\n exit 1\n fi\n # extract the total key from the output\n totalKey=$(echo $output | awk -F', ' '{print $3}')\n # check if total key is a number\n case $totalKey in\n *[!0-9]*)\n echo \"ERROR: snapshot totalKey is not a valid number.\"\n exit 1\n ;;\n esac\n\n # define a threshold to check if the total key count is too low\n # consider increasing this value when dealing with production-grade etcd cluster\n threshold=$BACKUP_KEY_THRESHOLD #[modifiable]\n if [ \"$totalKey\" -lt $threshold ]; then\n echo \"WARNING: snapshot totalKey is less than the threshold\"\n exit 1\n fi\n}\n\ngetClientProtocol() {\n # check client tls if is enabled\n line=$(grep 'advertise-client-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\ngetPeerProtocol() {\n # check peer tls if is enabled\n line=$(grep 'initial-advertise-peer-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\nexecEtcdctl() {\n local endpoints=$1\n shift\n clientProtocol=$(getClientProtocol)\n tlsDir=$TLS_MOUNT_PATH\n # check if the clientProtocol is https and the tlsDir is not empty\n if [ $clientProtocol = \"https\" ] \u0026\u0026 [ -d \"$tlsDir\" ] \u0026\u0026 [ -s \"${tlsDir}/ca.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.key\" ]; then\n etcdctl --endpoints=${endpoints} --cacert=${tlsDir}/ca.crt --cert=${tlsDir}/tls.crt --key=${tlsDir}/tls.key \"$@\"\n elif [ $clientProtocol = \"http\" ]; then\n etcdctl --endpoints=${endpoints} \"$@\"\n else\n echo \"ERROR: bad etcdctl args: clientProtocol:${clientProtocol}, endpoints:${endpoints}, tlsDir:${tlsDir}, please check!\"\n exit 1\n fi\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\n# this function will be deprecated in the future\nexecEtcdctlNoCheckTLS() {\n local endpoints=$1\n shift\n etcdctl --endpoints=${endpoints} \"$@\"\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\nupdateLeaderIfNeeded() {\n local retries=$1\n\n if [ $retries -le 0 ]; then\n echo \"Maximum number of retries reached, leader is not ready\"\n exit 1\n fi\n\n status=$(execEtcdctlNoCheckTLS ${leaderEndpoint} endpoint status)\n isLeader=$(echo $status | awk -F ', ' '{print $5}')\n if [ \"$isLeader\" = \"false\" ]; then\n echo \"leader out of status, try to redirect to new leader\"\n peerEndpoints=$(execEtcdctlNoCheckTLS \"$leaderEndpoint\" member list | awk -F', ' '{print $5}' | tr '\\n' ',' | sed 's#,$##')\n leaderEndpoint=$(execEtcdctlNoCheckTLS \"$peerEndpoints\" endpoint status | awk -F', ' '$5==\"true\" {print $1}')\n if [ $leaderEndpoint = \"\" ]; then\n echo \"leader is not ready, wait for 2s...\"\n sleep 2\n updateLeaderIfNeeded $(expr $retries - 1)\n fi\n fi\n}\n#!/bin/sh\nset -ex\nendpoints=$(echo $KB_MEMBER_ADDRESSES | tr ',' '\\n')\nleaverEndpoint=$(echo \"$endpoints\" | grep $KB_LEAVE_MEMBER_POD_NAME)\n\nif [ $leaverEndpoint = \"\" ]; then\n echo \"ERROR: leave member pod name not found in member addresses\"\n exit 1\nfi\n\nETCDID=$(execEtcdctl $leaverEndpoint endpoint status | awk -F', ' '{print $2}')\nexecEtcdctl $KB_MEMBER_ADDRESSES member remove $ETCDID\n"]}},{"name":"roleProbe","exec":{"command":["/bin/sh","-c","#!/bin/sh\n\n# config file used to bootstrap the etcd cluster\nconfigFile=$TMP_CONFIG_PATH\n\ncheckBackupFile() {\n local backupFile=$1\n output=$(etcdutl snapshot status ${backupFile})\n # check if the command was successful\n if [ $? -ne 0 ]; then\n echo \"ERROR: Failed to check the backup file with etcdutl\"\n exit 1\n fi\n # extract the total key from the output\n totalKey=$(echo $output | awk -F', ' '{print $3}')\n # check if total key is a number\n case $totalKey in\n *[!0-9]*)\n echo \"ERROR: snapshot totalKey is not a valid number.\"\n exit 1\n ;;\n esac\n\n # define a threshold to check if the total key count is too low\n # consider increasing this value when dealing with production-grade etcd cluster\n threshold=$BACKUP_KEY_THRESHOLD #[modifiable]\n if [ \"$totalKey\" -lt $threshold ]; then\n echo \"WARNING: snapshot totalKey is less than the threshold\"\n exit 1\n fi\n}\n\ngetClientProtocol() {\n # check client tls if is enabled\n line=$(grep 'advertise-client-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\ngetPeerProtocol() {\n # check peer tls if is enabled\n line=$(grep 'initial-advertise-peer-urls' ${configFile})\n if echo $line | grep -q 'https'; then\n echo \"https\"\n elif echo $line | grep -q 'http'; then\n echo \"http\"\n fi\n}\n\nexecEtcdctl() {\n local endpoints=$1\n shift\n clientProtocol=$(getClientProtocol)\n tlsDir=$TLS_MOUNT_PATH\n # check if the clientProtocol is https and the tlsDir is not empty\n if [ $clientProtocol = \"https\" ] \u0026\u0026 [ -d \"$tlsDir\" ] \u0026\u0026 [ -s \"${tlsDir}/ca.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.crt\" ] \u0026\u0026 [ -s \"${tlsDir}/tls.key\" ]; then\n etcdctl --endpoints=${endpoints} --cacert=${tlsDir}/ca.crt --cert=${tlsDir}/tls.crt --key=${tlsDir}/tls.key \"$@\"\n elif [ $clientProtocol = \"http\" ]; then\n etcdctl --endpoints=${endpoints} \"$@\"\n else\n echo \"ERROR: bad etcdctl args: clientProtocol:${clientProtocol}, endpoints:${endpoints}, tlsDir:${tlsDir}, please check!\"\n exit 1\n fi\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\n# this function will be deprecated in the future\nexecEtcdctlNoCheckTLS() {\n local endpoints=$1\n shift\n etcdctl --endpoints=${endpoints} \"$@\"\n # check if the etcdctl command was successful\n if [ $? -ne 0 ]; then\n echo \"etcdctl command failed\"\n exit 1\n fi\n}\n\nupdateLeaderIfNeeded() {\n local retries=$1\n\n if [ $retries -le 0 ]; then\n echo \"Maximum number of retries reached, leader is not ready\"\n exit 1\n fi\n\n status=$(execEtcdctlNoCheckTLS ${leaderEndpoint} endpoint status)\n isLeader=$(echo $status | awk -F ', ' '{print $5}')\n if [ \"$isLeader\" = \"false\" ]; then\n echo \"leader out of status, try to redirect to new leader\"\n peerEndpoints=$(execEtcdctlNoCheckTLS \"$leaderEndpoint\" member list | awk -F', ' '{print $5}' | tr '\\n' ',' | sed 's#,$##')\n leaderEndpoint=$(execEtcdctlNoCheckTLS \"$peerEndpoints\" endpoint status | awk -F', ' '$5==\"true\" {print $1}')\n if [ $leaderEndpoint = \"\" ]; then\n echo \"leader is not ready, wait for 2s...\"\n sleep 2\n updateLeaderIfNeeded $(expr $retries - 1)\n fi\n fi\n}\n#!/bin/sh\n\nstatus=$(execEtcdctl 127.0.0.1:2379 endpoint status --command-timeout=300ms --dial-timeout=100m)\nIsLeader=$(echo $status | awk -F ', ' '{print $5}')\nIsLearner=$(echo $status | awk -F ', ' '{print $6}')\n\nif [ \"true\" = \"$IsLeader\" ]; then\n echo -n \"leader\"\nelif [ \"true\" = \"$IsLearner\" ]; then\n echo -n \"learner\"\nelif [ \"false\" = \"$IsLeader\" ] \u0026\u0026 [ \"false\" = \"$IsLearner\" ]; then\n echo -n \"follower\"\nelse\n echo -n \"bad role, please check!\"\n exit 1\nfi\n"]}}]
KB_AGENT_PROBE: [{"action":"roleProbe"}]
Mounts:
/kubeblocks from kubeblocks (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bmht2 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
shell:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: etcd-ptyfua-etcd-etcd-configuration-tpl
Optional: false
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: etcd-ptyfua-etcd-etcd-scripts
Optional: false
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-etcd-ptyfua-etcd-2
ReadOnly: false
kubeblocks:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-bmht2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: kb-data=true:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m27s default-scheduler Successfully assigned default/etcd-ptyfua-etcd-2 to gke-dhtest-gke-dhtest-gke-05a50c4d-dzqd
Normal Pulled 2m23s kubelet Container image "docker.io/busybox:1.35-musl" already present on machine
Normal Created 2m23s kubelet Created container inject-shell
Normal Started 2m23s kubelet Started container inject-shell
Normal Pulled 2m19s kubelet Container image "docker.io/apecloud/kubeblocks-tools:1.0.0-alpha.5" already present on machine
Normal Created 2m19s kubelet Created container init-kbagent
Normal Started 2m19s kubelet Started container init-kbagent
Normal Started 2m17s kubelet Started container kbagent
Normal Pulled 2m17s kubelet Container image "docker.io/apecloud/etcd:v3.5.6" already present on machine
Normal Created 2m17s kubelet Created container kbagent
Warning Failed 116s (x3 over 2m17s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod053fc1c3_d474_460e_9f3c_88f7a763d2b9.slice/cri-containerd-etcd.scope/cgroup.controllers: no such file or directory: unknown
Warning BackOff 102s (x5 over 2m15s) kubelet Back-off restarting failed container etcd in pod etcd-ptyfua-etcd-2_default(053fc1c3-d474-460e-9f3c-88f7a763d2b9)
Normal Created 91s (x4 over 2m18s) kubelet Created container etcd
Normal Pulled 91s (x4 over 2m18s) kubelet Container image "docker.io/apecloud/etcd:v3.5.15" already present on machine
Normal roleProbe 77s kbagent {"probe":"roleProbe","code":-1,"message":"grep: /var/run/etcd/etcd.conf: No such file or directory\n/bin/sh: 59: [: =: unexpected operator\n/bin/sh: 61: [: =: unexpected operator\n: failed"}
Normal roleProbe 17s kbagent {"probe":"roleProbe","code":-1,"message":"grep: /var/run/etcd/etcd.conf: No such file or directory\n/bin/sh: 59: [: =: unexpected operator\n/bin/sh: 61: [: =: unexpected operator\n: failed"}
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.