kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG]mogdb switchover failed

Open ahjing99 opened this issue 1 year ago • 2 comments

➜ ~ kbcli version Kubernetes: v1.28.7-gke.1026000 KubeBlocks: 0.9.0-beta.15 kbcli: 0.9.0-beta.4

# Add Helm repo 
helm repo add kubeblocks-addons https://apecloud.github.io/helm-charts
# If github is not accessible or very slow for you, please use following repo instead
helm repo add kubeblocks-addons https://jihulab.com/api/v4/projects/150246/packages/helm/stable
# Update helm repo
helm repo update
# Update mogdb to enable hostnetwork
helm upgrade -i kb-addon-mogdb kubeblocks-addons/mogdb  -n kb-system --version 0.9.0

  1. Create cluster ,k apply -f cluster.yaml
 apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: mogdb-cluster
  namespace: default
spec:
  clusterDefinitionRef: mogdb
  clusterVersionRef: mogdb-5.0.5
  terminationPolicy: Delete
  componentSpecs:
  - name: mogdb
    componentDefRef: mogdb
    enabledLogs:
    - running
    serviceAccountName: kb-mogdb-cluster
    replicas: 2
    resources:
      limits:
        cpu: '0.5'
        memory: 0.5Gi
      requests:
        cpu: '0.5'
        memory: 0.5Gi
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  1. switchover
create role: 

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: mogdb-cluster-switchover-role
  labels:
    app.kubernetes.io/instance: mogdb-cluster
rules:
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: mogdb-cluster-switchover
  labels:
    app.kubernetes.io/instance: mogdb-cluster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: mogdb-cluster-switchover-role
subjects:
  - kind: ServiceAccount
    name: kb-mogdb-cluster
    namespace: default

➜  ~ kbcli cluster describe mogdb-cluster
Name: mogdb-cluster	 Created Time: Apr 26,2024 11:52 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION       STATUS    TERMINATION-POLICY
default     mogdb                mogdb-5.0.5   Running   Delete

Endpoints:
COMPONENT   MODE        INTERNAL                                              EXTERNAL
mogdb       ReadWrite   mogdb-cluster-mogdb.default.svc.cluster.local:26000   <none>

Topology:
COMPONENT   INSTANCE                ROLE        STATUS    AZ              NODE                                                  CREATED-TIME
mogdb       mogdb-cluster-mogdb-0   primary     Running   us-central1-c   gke-yjtest-default-pool-e77a0986-5w42/10.128.15.226   Apr 26,2024 13:15 UTC+0800
mogdb       mogdb-cluster-mogdb-1   secondary   Running   us-central1-c   gke-yjtest-default-pool-e77a0986-3xfx/10.128.0.52     Apr 26,2024 13:15 UTC+0800

Resources Allocation:
COMPONENT   DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
mogdb       false       1 / 1                1Gi / 1Gi               data:30Gi      kb-default-sc

Images:
COMPONENT   TYPE    IMAGE
mogdb       mogdb   swr.cn-north-4.myhuaweicloud.com/mogdb/mogdb:5.0.5

Data Protection:
BACKUP-REPO   AUTO-BACKUP   BACKUP-SCHEDULE   BACKUP-METHOD   BACKUP-RETENTION

Show cluster events: kbcli cluster list-events -n default mogdb-cluster


➜  ~ kbcli cluster custom-ops mogdb-switchover --cluster mogdb-cluster  --component mogdb --auto-approve --candidate mogdb-cluster-mogdb-1
args: [mogdb-switchover --cluster mogdb-cluster --component mogdb --auto-approve --candidate mogdb-cluster-mogdb-1]
OpsRequest mogdb-cluster-custom-lxkqv created successfully, you can view the progress:
	kbcli cluster describe-ops mogdb-cluster-custom-lxkqv -n default

➜  ~ kbcli cluster describe-ops mogdb-cluster-custom-lxkqv -n default
Spec:
  Name: mogdb-cluster-custom-lxkqv	NameSpace: default	Cluster: mogdb-cluster	Type: Custom

Command: <none>

Status:
  Start Time:         Apr 26,2024 14:11 UTC+0800
  Completion Time:    Apr 26,2024 14:14 UTC+0800
  Duration:           2m9s
  Status:             Failed
  Progress:           1/1
                      OBJECT-KEY   STATUS   DURATION   MESSAGE
                                   Failed   2m7s       the action "switchover" of the component "mogdb" is Failed

Conditions:
LAST-TRANSITION-TIME         TYPE                 REASON                     STATUS   MESSAGE
Apr 26,2024 14:11 UTC+0800   WaitForProgressing   WaitForProgressing         True     wait for the controller to process the OpsRequest: mogdb-cluster-custom-lxkqv in Cluster: mogdb-cluster
Apr 26,2024 14:11 UTC+0800   Validated            ValidateOpsRequestPassed   True     OpsRequest: mogdb-cluster-custom-lxkqv is validated
Apr 26,2024 14:11 UTC+0800   CustomOperation      MogdbSwitchoverStarting    True     Start to handle MogdbSwitchover on the Cluster: mogdb-cluster
Apr 26,2024 14:14 UTC+0800   Failed               OpsRequestFailed           False    Failed to process OpsRequest: mogdb-cluster-custom-lxkqv in cluster: mogdb-cluster, more detailed informations in status.components

Warning Events:
TIME                         TYPE      REASON             OBJECT                                  MESSAGE
Apr 26,2024 14:13 UTC+0800   Warning   Failed             OpsRequest/mogdb-cluster-custom-lxkqv   the action "switchover" of the component "mogdb" is Failed
Apr 26,2024 14:14 UTC+0800   Warning   OpsRequestFailed   OpsRequest/mogdb-cluster-custom-lxkqv   Failed to process OpsRequest: mogdb-cluster-custom-lxkqv in cluster: mogdb-cluster, more detailed informations in status.components

➜  ~ k logs 3ae6b12c-mogdb-cluster-cust-mogdb-switchover-0-qhsfv
Defaulted container "switchover" out of: switchover, ops-utils (init)
INFO: doing switchover..
INFO: candidate: mogdb-cluster-mogdb-1
+ echo 'INFO: doing switchover..'
+ echo 'INFO: candidate: mogdb-cluster-mogdb-1'
+ kubectl exec -it mogdb-cluster-mogdb-1 -c mogdb -- gosu omm gs_ctl switchover
Unable to use a TTY - input is not a terminal or the right kind of file
[2024-04-26 06:11:53.862][22476][][gs_ctl]: gs_ctl switchover ,datadir is /var/lib/mogdb/data
[2024-04-26 06:11:53.862][22476][][gs_ctl]: switchover term (1)
[2024-04-26 06:11:53.872][22476][][gs_ctl]: waiting for server to switchover...............................................................
[2024-04-26 06:12:54.498][22476][][gs_ctl]:
 switchover timeout after 60 seconds. please manually check the cluster status.
INFO: start to check if switchover successfully, timeout is 60s
+ echo 'INFO: start to check if switchover successfully, timeout is 60s'
+ date '+%s'
+ executedUnix=1714111974
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson+
jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714111979
+ diff_time=5
+ '[' 5 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson+ jq -r '.metadata.labels["kubeblocks.io/role"]'

+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714111984
+ diff_time=10
+ '[' 10 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714111989
+ diff_time=15
+ '[' 15 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714111995
+ diff_time=21
+ '[' 21 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112000
+ diff_time=26
+ '[' 26 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112005
+ diff_time=31
+ '[' 31 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112010
+ diff_time=36
+ '[' 36 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112015
+ diff_time=41
+ '[' 41 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112020
+ diff_time=46
+ '[' 46 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112025
+ diff_time=51
+ '[' 51 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112030
+ diff_time=56
+ '[' 56 -ge 60 ]
+ true
+ sleep 5
+ '[' '!' -z mogdb-cluster-mogdb-1 ]
+ kubectl get pod mogdb-cluster-mogdb-1 -ojson
+ jq -r '.metadata.labels["kubeblocks.io/role"]'
+ role=secondary
+ '[' secondary '==' Primary ]
+ '[' secondary '==' primary ]
+ '[' secondary '==' leader ]
+ '[' secondary '==' master ]
+ date '+%s'
+ currentUnix=1714112035
+ diff_time=61
+ '[' 61 -ge 60 ]
+ echo 'ERROR: switchover failed.'
+ exit 1
ERROR: switchover failed.

➜ ~ kbcli report cluster --with-logs --all-containers mogdb-cluster reporting cluster information to report-cluster-mogdb-cluster-2024-04-26-14-15-47.zip processing manifests OK processing events OK process pod logs

➜ ~ kbcli report kubeblocks --with-logs --all-containers --output yaml reporting KubeBlocks information to report-kubeblocks-2024-04-26-14-16-17.zip processing manifests OK processing events OK process pod logs OK report-kubeblocks-2024-04-26-14-16-17.zip report-cluster-mogdb-cluster-2024-04-26-14-15-47.zip

ahjing99 avatar Apr 26 '24 06:04 ahjing99

dup https://github.com/apecloud/kubeblocks-addons/issues/394

JashBook avatar Apr 26 '24 06:04 JashBook

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar May 27 '24 00:05 github-actions[bot]