kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] starrocks ent shared-nothing cluster rebuild be instance: Backend node not found

Open JashBook opened this issue 1 year ago • 3 comments

Describe the bug

kbcli version
Kubernetes: v1.29.6-gke.1038001
KubeBlocks: 0.9.1-beta.6
kbcli: 0.9.0

ERROR 1064 (HY000): Backend node not found. Check if any backend node is down.backend: [strsent-lxdilk-be-0.strsent-lxdilk-be-headless.default.svc.cluster.local alive: false inBlacklist: false] [strsent-lxdilk-be-1.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false] [strsent-lxdilk-be-2.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false]

To Reproduce Steps to reproduce the behavior:

  1. create cluster starrocks ent cluster shared-nothing
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: strsent-lxdilk
  namespace: default
spec:
  terminationPolicy: Delete
  componentSpecs:
    - name: be
      componentDef: starrocks-be
      serviceAccountName: kb-strsent-lxdilk
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: fe
      componentDef: starrocks-fe-sn
      serviceAccountName: kb-strsent-lxdilk
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
kbcli cluster list-instances strsent-lxdilk --namespace default 
    
NAME                  NAMESPACE   CLUSTER          COMPONENT   STATUS    ROLE     ACCESSMODE   AZ              CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                                                             CREATED-TIME                 
strsent-lxdilk-be-0   default     strsent-lxdilk   be          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-25c8fd47-whii/10.10.0.31   Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-be-1   default     strsent-lxdilk   be          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-25c8fd47-whii/10.10.0.31   Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-fe-0   default     strsent-lxdilk   fe          Running   <none>   <none>       us-central1-f   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-4c8g-a94cd103-kjll/10.10.0.119        Jul 23,2024 12:05 UTC+0800   
strsent-lxdilk-fe-1   default     strsent-lxdilk   fe          Running   <none>   <none>       us-central1-a   3 / 3                8Gi / 8Gi               data:20Gi   gke-infracreate-gke-kbdata-e2-standar-765d90c7-r9z4/10.10.0.79   Jul 23,2024 12:05 UTC+0800   
  1. insert data
kubectl exec -it strsent-lxdilk-fe-0 -c fe --namespace default bash

mysql -P9030 -hstrsent-lxdilk-fe-fe.default.svc -uroot -p'8ml3Sg3m97' 

CREATE DATABASE IF NOT EXISTS mydb; 
use mydb; 
DROP TABLE IF EXISTS tmp_table; 
CREATE TABLE IF NOT EXISTS tmp_table (id INT, value STRING) PROPERTIES  ( 'replication_num' = '1' ); 
INSERT INTO tmp_table (id, value) VALUES (1,'ljledwjjae'); 
  1. rebuild instance be
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  generateName: strsent-lxdilk-rebuildinstance-
  namespace: default
spec:
  type: RebuildInstance
  clusterRef: strsent-lxdilk
  force: true
  rebuildFrom:
    - componentName: be
      instances:
        - name: strsent-lxdilk-be-0
kbcli cluster list-ops strsent-lxdilk --status all  --namespace default
NAME                                   NAMESPACE   TYPE              CLUSTER          COMPONENT   STATUS    PROGRESS   CREATED-TIME                 
ops_status:strsent-lxdilk-rebuildinstance-57q92   default     RebuildInstance   strsent-lxdilk   be          Succeed   1/1        Jul 23,2024 12:10 UTC+0800   

  1. See error
kubectl get cluster  strsent-lxdilk
NAME             CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
strsent-lxdilk                                  Delete               Running   16m

➜  ~ kubectl get pod -l app.kubernetes.io/instance=strsent-lxdilk 
NAME                  READY   STATUS    RESTARTS      AGE
strsent-lxdilk-be-1   3/3     Running   2 (14m ago)   17m
strsent-lxdilk-be-2   3/3     Running   0             12m
strsent-lxdilk-fe-0   3/3     Running   0             17m
strsent-lxdilk-fe-1   3/3     Running   1 (14m ago)   17m
kubectl exec -it strsent-lxdilk-fe-0 -c fe --namespace default bash

mysql -P9030 -hstrsent-lxdilk-fe-fe.default.svc -uroot -p'8ml3Sg3m97' 

use mydb; 

SELECT value FROM tmp_table WHERE id = 1;
ERROR 1064 (HY000): Backend node not found. Check if any backend node is down.backend: [strsent-lxdilk-be-0.strsent-lxdilk-be-headless.default.svc.cluster.local alive: false inBlacklist: false] [strsent-lxdilk-be-1.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false] [strsent-lxdilk-be-2.strsent-lxdilk-be-headless.default.svc.cluster.local alive: true inBlacklist: false] 

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Jul 23 '24 04:07 JashBook