kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] backup restore cluster delete hang

Open JashBook opened this issue 1 month ago • 3 comments

Describe the bug A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.10
KubeBlocks: 1.1.0-alpha.3
kbcli: 1.0.1

wait for the workloads to be deleted: map[{workloads.kubeblocks.io/v1, Kind=InstanceSet default/mysql-bk-mysql}:0xc003928708]

To Reproduce Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: mysql-ccxdow
  namespace: default
spec:
  clusterDef: mysql
  topology: semisync
  terminationPolicy: WipeOut
  componentSpecs:
    - name: mysql
      serviceVersion: 8.0.30
      
      disableExporter: true
      replicas: 2
      resources:
        limits:
          cpu: 100m
          memory: 0.5Gi
        requests:
          cpu: 100m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
kubectl get cluster mysql-ccxdow -w
NAME           CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
mysql-ccxdow   mysql                WipeOut              Running    4m20s
  1. backup
kbcli cluster backup mysql-ccxdow  --method xtrabackup 
Backup backup-default-mysql-ccxdow-20251104152833 created successfully, you can view the progress:
	kbcli cluster list-backups --names=backup-default-mysql-ccxdow-20251104152833 -n default
  1. restore
kbcli cluster restore mysql-bk --backup backup-default-mysql-ccxdow-20251104152833
Cluster mysql-bk created

kubectl get cluster mysql-bk
NAME       CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS    AGE
mysql-bk   mysql                WipeOut              Running   4m55s
  1. delete restore cluster
kbcli cluster delete mysql-bk --auto-approve 
Cluster mysql-bk deleted
  1. See error
kubectl get cluster mysql-bk 
NAME       CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
mysql-bk   mysql                WipeOut              Deleting   10m
➜  ~ 
➜  ~ kubectl get pod -l app.kubernetes.io/instance=mysql-bk                                 
No resources found in default namespace.
➜  ~ 
➜  ~ kubectl get cmp -l app.kubernetes.io/instance=mysql-bk           
NAME             DEFINITION                SERVICE-VERSION   STATUS     AGE
mysql-bk-mysql   mysql-8.0-1.1.0-alpha.0   8.0.30            Deleting   11m
➜  ~ 
➜  ~ kubectl get its -l app.kubernetes.io/instance=mysql-bk                                
NAME             DESIRED   UP-TO-DATE   READY   AVAILABLE   AGE
mysql-bk-mysql   2                      2       2           10m
➜  ~ 
➜  ~ kubectl get svc -l app.kubernetes.io/instance=mysql-bk
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
mysql-bk-mysql   ClusterIP   10.225.218.96   <none>        3306/TCP   11m
➜  ~ 
➜  ~ kubectl get cm -l app.kubernetes.io/instance=mysql-bk
NAME                                           DATA   AGE
mysql-bk-mysql-env                             5      11m
mysql-bk-mysql-haconfig                        0      9m1s
mysql-bk-mysql-hahistory                       1      6m27s
mysql-bk-mysql-leader                          0      9m1s
mysql-bk-mysql-mysql-scripts                   6      11m
sidecar-mysql-bk-mysql-config-manager-config   1      11m
➜  ~ 
➜  ~ kubectl get secret -l app.kubernetes.io/instance=mysql-bk
NAME                                      TYPE     DATA   AGE
mysql-bk-mysql-account-kbadmin            Opaque   2      11m
mysql-bk-mysql-account-kbdataprotection   Opaque   2      11m
mysql-bk-mysql-account-kbmonitoring       Opaque   2      11m
mysql-bk-mysql-account-kbprobe            Opaque   2      11m
mysql-bk-mysql-account-kbreplicator       Opaque   2      11m
mysql-bk-mysql-account-proxysql           Opaque   2      11m
mysql-bk-mysql-account-root               Opaque   2      11m

describe cluster

kubectl describe cluster mysql-bk 
Name:         mysql-bk
Namespace:    default
Labels:       clusterdefinition.kubeblocks.io/name=mysql
Annotations:  kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1
API Version:  apps.kubeblocks.io/v1
Kind:         Cluster
Metadata:
  Creation Timestamp:             2025-11-04T07:30:39Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2025-11-04T07:35:41Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:        2
  Resource Version:  86704
  UID:               740ad715-4abf-4a84-96dc-ed8bddedaeec
Spec:
  Cluster Def:  mysql
  Component Specs:
    Component Def:          mysql-8.0-1.1.0-alpha.0
    Disable Exporter:       true
    Flat Instance Ordinal:  false
    Name:                   mysql
    Pod Update Policy:      PreferInPlace
    Replicas:               2
    Resources:
      Limits:
        Cpu:     100m
        Memory:  512Mi
      Requests:
        Cpu:          100m
        Memory:       512Mi
    Service Version:  8.0.30
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:   20Gi
  Termination Policy:  WipeOut
  Topology:            semisync
Status:
  Components:
    Mysql:
      Observed Generation:  1
      Phase:                Running
      Up To Date:           true
  Conditions:
    Last Transition Time:  2025-11-04T07:30:39Z
    Message:               The operator has started the provisioning of Cluster: mysql-bk
    Observed Generation:   1
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2025-11-04T07:30:39Z
    Message:               Successfully applied for resources
    Observed Generation:   1
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2025-11-04T07:35:30Z
    Message:               cluster mysql-bk is ready
    Reason:                ClusterReady
    Status:                True
    Type:                  Ready
  Observed Generation:     1
  Phase:                   Deleting
Events:
  Type    Reason                           Age                    From                Message
  ----    ------                           ----                   ----                -------
  Normal  PreCheckSucceed                  11m                    cluster-controller  The operator has started the provisioning of Cluster: mysql-bk
  Normal  ApplyResourcesSucceed            11m                    cluster-controller  Successfully applied for resources
  Normal  ClusterComponentPhaseTransition  11m (x2 over 11m)      cluster-controller  cluster component mysql is Creating
  Normal  ClusterReady                     7m5s                   cluster-controller  cluster mysql-bk is ready
  Normal  Running                          7m5s                   cluster-controller  Cluster: mysql-bk is ready, current phase is Running
  Normal  ClusterComponentPhaseTransition  6m58s (x6 over 7m5s)   cluster-controller  cluster component mysql is Running
  Normal  DeletingCR                       6m54s (x3 over 6m54s)  cluster-controller  Deleting : mysql-bk

logs kubeblocks

➜  ~ kubectl logs -n kb-system kubeblocks-85864d9c7-cql5g|grep "wait for the workloads"|grep mysql-bk-mysql
Defaulted container "manager" out of: manager, tools (init)
2025-11-04T07:35:41.702Z	INFO	wait for the workloads to be deleted: map[{workloads.kubeblocks.io/v1, Kind=InstanceSet default/mysql-bk-mysql}:0xc003928708]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"mysql-bk-mysql","namespace":"default"}, "namespace": "default", "name": "mysql-bk-mysql", "reconcileID": "beac1f2e-9b30-4d4d-9ce9-8256dd31b099", "component": {"name":"mysql-bk-mysql","namespace":"default"}}
2025-11-04T07:35:41.787Z	INFO	wait for the workloads to be deleted: map[{workloads.kubeblocks.io/v1, Kind=InstanceSet default/mysql-bk-mysql}:0xc003c95108]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"mysql-bk-mysql","namespace":"default"}, "namespace": "default", "name": "mysql-bk-mysql", "reconcileID": "d5227335-f03a-4a2f-9a39-1b638f9fff39", "component": {"name":"mysql-bk-mysql","namespace":"default"}}
2025-11-04T07:35:41.860Z	INFO	wait for the workloads to be deleted: map[{workloads.kubeblocks.io/v1, Kind=InstanceSet default/mysql-bk-mysql}:0xc003fe8708]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"mysql-bk-mysql","namespace":"default"}, "namespace": "default", "name": "mysql-bk-mysql", "reconcileID": "5cb551b2-6f6f-4fdb-98c1-796666f41ca4", "component": {"name":"mysql-bk-mysql","namespace":"default"}}

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Nov 04 '25 07:11 JashBook

starrocks-ce cluster delete hang

  1. create cluster
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: strce-shiopn
  namespace: default
spec:
  clusterDef: starrocks-ce
  topology: shared-nothing
  terminationPolicy: WipeOut
  componentSpecs:
    - name: fe
      serviceVersion: 3.2.2
      disableExporter: true
      replicas: 1
      resources:
        requests:
          cpu: 1000m
          memory: 2Gi
        limits:
          cpu: 1000m
          memory: 2Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: be
      serviceVersion: 3.2.2
      replicas: 2
      resources:
        requests:
          cpu: 1000m
          memory: 2Gi
        limits:
          cpu: 1000m
          memory: 2Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. delete cluster
kbcli cluster delete strce-shiopn --auto-approve --namespace default
  1. see error
kubectl get cluster,pod,its,cmp,cm,secret,svc -l app.kubernetes.io/instance=strce-shiopn
NAME                                      CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
cluster.apps.kubeblocks.io/strce-shiopn   starrocks-ce         WipeOut              Deleting   36m

NAME                                                  DESIRED   UP-TO-DATE   READY   AVAILABLE   AGE
instanceset.workloads.kubeblocks.io/strce-shiopn-fe   1                      1       1           36m

NAME                                           DEFINITION                      SERVICE-VERSION   STATUS     AGE
component.apps.kubeblocks.io/strce-shiopn-fe   starrocks-ce-fe-1.1.0-alpha.0   3.2.2             Deleting   36m

NAME                                DATA   AGE
configmap/strce-shiopn-fe-env       1      36m
configmap/strce-shiopn-fe-fe-cm     1      36m
configmap/strce-shiopn-fe-scripts   1      36m

NAME                                  TYPE     DATA   AGE
secret/strce-shiopn-fe-account-root   Opaque   2      36m

NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/strce-shiopn-fe-fe   ClusterIP   10.225.212.252   <none>        8030/TCP,9030/TCP   36m

des cluster

kubectl describe cluster strce-shiopn
Name:         strce-shiopn
Namespace:    default
Labels:       app.kubernetes.io/instance=strce-shiopn
              clusterdefinition.kubeblocks.io/name=starrocks-ce
Annotations:  kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1
API Version:  apps.kubeblocks.io/v1
Kind:         Cluster
Metadata:
  Creation Timestamp:             2025-11-04T07:17:01Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2025-11-04T07:43:42Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:        13
  Resource Version:  92532
  UID:               66b89407-57f2-4c15-b6f5-409b454f44bc
Spec:
  Cluster Def:  starrocks-ce
  Component Specs:
    Annotations:
      kubeblocks.io/restart:  2025-11-04T07:40:58Z
    Component Def:            starrocks-ce-fe-1.1.0-alpha.0
    Disable Exporter:         true
    Flat Instance Ordinal:    false
    Name:                     fe
    Pod Update Policy:        PreferInPlace
    Replicas:                 1
    Resources:
      Limits:
        Cpu:     1100m
        Memory:  2254857830400m
      Requests:
        Cpu:          1100m
        Memory:       2254857830400m
    Service Version:  3.2.2
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  20Gi
    Annotations:
      kubeblocks.io/restart:  2025-11-04T07:40:58Z
    Component Def:            starrocks-ce-be-1.1.0-alpha.0
    Flat Instance Ordinal:    false
    Name:                     be
    Pod Update Policy:        PreferInPlace
    Replicas:                 2
    Resources:
      Limits:
        Cpu:     1100m
        Memory:  2254857830400m
      Requests:
        Cpu:          1100m
        Memory:       2254857830400m
    Service Version:  3.2.2
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:   24Gi
  Termination Policy:  WipeOut
  Topology:            shared-nothing
Status:
  Components:
    Be:
      Message:
        InstanceSet/strce-shiopn-be:  ["strce-shiopn-be-0"]
      Observed Generation:            12
      Phase:                          Running
      Up To Date:                     true
    Fe:
      Observed Generation:  12
      Phase:                Running
      Up To Date:           true
  Conditions:
    Last Transition Time:  2025-11-04T07:17:01Z
    Message:               The operator has started the provisioning of Cluster: strce-shiopn
    Observed Generation:   12
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2025-11-04T07:17:01Z
    Message:               Successfully applied for resources
    Observed Generation:   12
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2025-11-04T07:34:00Z
    Message:               cluster strce-shiopn is ready
    Reason:                ClusterReady
    Status:                True
    Type:                  Ready
  Observed Generation:     12
  Phase:                   Deleting
Events:
  Type    Reason                           Age                  From                Message
  ----    ------                           ----                 ----                -------
  Normal  PreCheckSucceed                  37m (x2 over 37m)    cluster-controller  The operator has started the provisioning of Cluster: strce-shiopn
  Normal  ApplyResourcesSucceed            37m (x2 over 37m)    cluster-controller  Successfully applied for resources
  Normal  ClusterComponentPhaseTransition  34m (x10 over 37m)   cluster-controller  cluster component fe is Creating
  Normal  ClusterComponentPhaseTransition  31m (x3 over 33m)    cluster-controller  cluster component be is Creating
  Normal  ClusterComponentPhaseTransition  26m (x12 over 28m)   cluster-controller  cluster component fe is Starting
  Normal  ClusterComponentPhaseTransition  11m (x106 over 33m)  cluster-controller  cluster component fe is Running

logs kubeblocks

2025-11-04T07:44:10.926Z	INFO	reconcile object *v1.InstanceSet with action DELETE OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"strce-shiopn-fe","namespace":"default"}, "namespace": "default", "name": "strce-shiopn-fe", "reconcileID": "b5dc4a42-312f-4278-aa25-30d8dca7f574", "component": {"name":"strce-shiopn-fe","namespace":"default"}}
2025-11-04T07:44:10.932Z	INFO	reconcile object *v1.Component with action STATUS OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"strce-shiopn-fe","namespace":"default"}, "namespace": "default", "name": "strce-shiopn-fe", "reconcileID": "b5dc4a42-312f-4278-aa25-30d8dca7f574", "component": {"name":"strce-shiopn-fe","namespace":"default"}}
2025-11-04T07:44:11.840Z	INFO	wait for the workloads to be deleted: map[{workloads.kubeblocks.io/v1, Kind=InstanceSet default/strce-shiopn-fe}:0xc00155f108]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"strce-shiopn-fe","namespace":"default"}, "namespace": "default", "name": "strce-shiopn-fe", "reconcileID": "bc494946-4ed1-4e75-941c-220088eaa7a2", "component": {"name":"strce-shiopn-fe","namespace":"default"}}
2025-11-04T07:44:11.840Z	INFO	reconcile object *v1.InstanceSet with action DELETE OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"strce-shiopn-fe","namespace":"default"}, "namespace": "default", "name": "strce-shiopn-fe", "reconcileID": "bc494946-4ed1-4e75-941c-220088eaa7a2", "component": {"name":"strce-shiopn-fe","namespace":"default"}}
2025-11-04T07:44:11.846Z	INFO	reconcile object *v1.Component with action STATUS OK	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"strce-shiopn-fe","namespace":"default"}, "namespace": "default", "name": "strce-shiopn-fe", "reconcileID": "bc494946-4ed1-4e75-941c-220088eaa7a2", "component": {"name":"strce-shiopn-fe","namespace":"default"}}
2025-11-04T07:44:23.011Z	INFO	wait for the components and shardings to be deleted: map[fe:{}]	{"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"strce-shiopn","namespace":"default"}, "namespace": "default", "name": "strce-shiopn", "reconcileID": "cb9b5491-7091-43e7-a802-ffb96d337aab", "cluster": {"name":"strce-shiopn","namespace":"default"}}

JashBook avatar Nov 04 '25 07:11 JashBook

While deleting the cluster, KubeBlocks reports wait for the workloads to be deleted but ITS is never deleted. By annotating the ITS, it got deleted. It seems deletion of of workload resources failed to notify ITS. By comparing the PVC( original cluster and retored cluster), we go the key difference: Owner Reference.

  1. PVC of orignal cluster
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
    volume.kubernetes.io/selected-node: kbv110-control-plane
    volume.kubernetes.io/storage-provisioner: rancher.io/local-path
  creationTimestamp: "2025-11-05T07:13:27Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/component: mysql-8.0-1.1.0-alpha.0
    app.kubernetes.io/instance: mysql-ccxdow
    app.kubernetes.io/managed-by: kubeblocks
    apps.kubeblocks.io/component-name: mysql
    apps.kubeblocks.io/pod-name: mysql-ccxdow-mysql-0
    apps.kubeblocks.io/release-phase: stable
    apps.kubeblocks.io/service-version: 8.0.30
    apps.kubeblocks.io/vct-name: data
    workloads.kubeblocks.io/instance: mysql-ccxdow-mysql
    workloads.kubeblocks.io/managed-by: InstanceSet
  name: data-mysql-ccxdow-mysql-0
  namespace: default
  ownerReferences:
  - apiVersion: workloads.kubeblocks.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: InstanceSet
    name: mysql-ccxdow-mysql
    uid: d9843450-670c-4170-8eec-34270cd26aee
  resourceVersion: "1789150"
  uid: d207de82-6421-40a6-adc6-3a2467e3486c
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: standard
  volumeMode: Filesystem
  volumeName: pvc-d207de82-6421-40a6-adc6-3a2467e3486c
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  phase: Bound
  1. PVC of restored cluster
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
    volume.kubernetes.io/selected-node: kbv110-control-plane
    volume.kubernetes.io/storage-provisioner: rancher.io/local-path
  creationTimestamp: "2025-11-05T07:11:56Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/component: mysql-8.0-1.1.0-alpha.0
    app.kubernetes.io/instance: mysql-bk2
    app.kubernetes.io/managed-by: kubeblocks
    apps.kubeblocks.io/component-name: mysql
    apps.kubeblocks.io/pod-name: mysql-bk2-mysql-0
    apps.kubeblocks.io/release-phase: stable
    apps.kubeblocks.io/service-version: 8.0.30
    apps.kubeblocks.io/vct-name: data
    componentdefinition.kubeblocks.io/name: mysql-8.0-1.1.0-alpha.0
    workloads.kubeblocks.io/instance: mysql-bk2-mysql
    workloads.kubeblocks.io/managed-by: InstanceSet
  name: data-mysql-bk2-mysql-0
  namespace: default
  resourceVersion: "1788354"
  uid: 339fbbea-36cd-4233-825f-8045e0e6f8c5
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: standard
  volumeMode: Filesystem
  volumeName: pvc-339fbbea-36cd-4233-825f-8045e0e6f8c5
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  phase: Bound

shanshanying avatar Nov 05 '25 07:11 shanshanying

@leon-inf AFAIC, we solve some issue similar to this one previously. Do you have any idea on this ?

shanshanying avatar Nov 05 '25 07:11 shanshanying