kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG]start starrocks cluster failed after stopping it

Open tianyue86 opened this issue 9 months ago • 1 comments

Describe the bug

Kubernetes: v1.31.1-aliyun.1
KubeBlocks: 1.0.0-beta.32
kbcli: 1.0.0-beta.15

To Reproduce Steps to reproduce the behavior:

  1. Create starrocks cluster with yaml below - running
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: strsce-yioztk
  namespace: default
spec:
  clusterDef: starrocks-ce
  topology: shared-nothing
  terminationPolicy: DoNotTerminate
  componentSpecs:
    - name: fe
      serviceVersion: 3.2.2
      disableExporter: true
      replicas: 2
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: be
      serviceVersion: 3.2.2
      replicas: 2
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. Stop it
kbcli cluster list-instances strsce-yioztk --namespace default
NAME                 NAMESPACE   CLUSTER         COMPONENT   STATUS    ROLE     ACCESSMODE   AZ                 CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                                   CREATED-TIME                 
strsce-yioztk-be-0   default     strsce-yioztk   be          Running   <none>                cn-zhangjiakou-c   1 / 1                1Gi / 1Gi               data:20Gi   cn-zhangjiakou.10.0.0.144/10.0.0.144   Mar 07,2025 15:39 UTC+0800   
strsce-yioztk-be-1   default     strsce-yioztk   be          Running   <none>                cn-zhangjiakou-c   1 / 1                1Gi / 1Gi               data:20Gi   cn-zhangjiakou.10.0.0.145/10.0.0.145   Mar 07,2025 15:40 UTC+0800   
strsce-yioztk-fe-0   default     strsce-yioztk   fe          Running   <none>                cn-zhangjiakou-c   1 / 1                1Gi / 1Gi               data:20Gi   cn-zhangjiakou.10.0.0.144/10.0.0.144   Mar 07,2025 15:37 UTC+0800   
strsce-yioztk-fe-1   default     strsce-yioztk   fe          Running   <none>                cn-zhangjiakou-c   1 / 1                1Gi / 1Gi               data:20Gi   cn-zhangjiakou.10.0.0.144/10.0.0.144   Mar 07,2025 15:38 UTC+0800   
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster stop strsce-yioztk --auto-approve --force=true  --namespace default
OpsRequest strsce-yioztk-stop-vrxr2 created successfully, you can view the progress:
	kbcli cluster describe-ops strsce-yioztk-stop-vrxr2 -n default
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all  --namespace default
NAME                       NAMESPACE   TYPE   CLUSTER         COMPONENT   STATUS    PROGRESS   CREATED-TIME                 
strsce-yioztk-stop-vrxr2   default     Stop   strsce-yioztk   be,fe       Running   2/4        Mar 07,2025 16:45 UTC+0800   
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % k get cluster | grep str
strsce-yioztk     starrocks-ce         DoNotTerminate       Stopping   68m
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % k get cluster | grep str
strsce-yioztk     starrocks-ce         DoNotTerminate       Stopped    68m
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all  --namespace default
NAME                       NAMESPACE   TYPE   CLUSTER         COMPONENT   STATUS    PROGRESS   CREATED-TIME                 
strsce-yioztk-stop-vrxr2   default     Stop   strsce-yioztk   be,fe       Succeed   4/4        Mar 07,2025 16:45 UTC+0800   
  1. Start it
kbcli cluster start strsce-yioztk --force=true --namespace default
OpsRequest strsce-yioztk-start-b6h92 created successfully, you can view the progress:
	kbcli cluster describe-ops strsce-yioztk-start-b6h92 -n default
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all  --namespace default
NAME                        NAMESPACE   TYPE    CLUSTER         COMPONENT   STATUS    PROGRESS   CREATED-TIME                 
strsce-yioztk-stop-vrxr2    default     Stop    strsce-yioztk   be,fe       Succeed   4/4        Mar 07,2025 16:45 UTC+0800   
strsce-yioztk-start-b6h92   default     Start   strsce-yioztk   be,fe       Running   0/4        Mar 07,2025 16:46 UTC+0800 
  1. check the cluster status
get pod|grep str
strsce-yioztk-be-0                  0/1     CrashLoopBackOff    11 (2m27s ago)   39m
strsce-yioztk-fe-0                  0/1     ContainerCreating   0                39m

k describe pod strsce-yioztk-be-0
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               39m                    default-scheduler        Successfully assigned default/strsce-yioztk-be-0 to cn-zhangjiakou.10.0.0.144
  Normal   SuccessfulAttachVolume  39m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "d-8vb23m26wqssi0fnw5jx"
  Normal   AllocIPSucceed          39m                    terway-daemon            Alloc IP 10.0.0.116/24 took 33.567815ms
  Normal   Pulled                  38m (x2 over 39m)      kubelet                  Container image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/be-ubuntu:3.2.2" already present on machine
  Normal   Created                 38m (x2 over 39m)      kubelet                  Created container be
  Normal   Started                 38m (x2 over 39m)      kubelet                  Started container be
  Warning  Unhealthy               9m17s (x127 over 39m)  kubelet                  Startup probe failed: Get "http://10.0.0.116:8040/api/health": dial tcp 10.0.0.116:8040: connect: connection refused
  Warning  BackOff                 4m11s (x125 over 37m)  kubelet                  Back-off restarting failed container be in pod strsce-yioztk-be-0_default(b6583351-eca9-491b-b8f8-eefe9dc04ad7)
  1. see error
[Fri Mar  7 17:22:39 CST 2025] /etc/starrocks/be/conf not exist or not a directory, ignore ...
[Fri Mar  7 17:22:39 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar  7 17:22:41 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar  7 17:22:43 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar  7 17:22:45 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar  7 17:22:47 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar  7 17:22:49 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...

A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

tianyue86 avatar Mar 07 '25 09:03 tianyue86

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar Apr 07 '25 00:04 github-actions[bot]

run passed on vke

--------------------------------------Starrocks CE (Topology = shared-nothing Replicas 2) Test Result--------------------------------------
[PASSED]|[Create]|[ComponentDefinition=starrocks-ce-be-1.0.0-alpha.0;ComponentVersion=starrocks-ce-be-1.0.0-alpha.0;ServiceVersion=3.2.2;]|[Description=Create a cluster with the specified component definition starrocks-ce-be-1.0.0-alpha.0 and component version starrocks-ce-be-1.0.0-alpha.0 and service version 3.2.2]
[PASSED]|[Connect]|[ComponentName=fe]|[Description=Connect to the cluster]
[PASSED]|[Stop]|[-]|[Description=Stop the cluster]
[PASSED]|[Start]|[-]|[Description=Start the cluster]
[PASSED]|[HorizontalScaling Out]|[ComponentName=be]|[Description=HorizontalScaling Out the cluster specify component be]
[PASSED]|[HorizontalScaling In]|[ComponentName=be]|[Description=HorizontalScaling In the cluster specify component be]
[PASSED]|[Restart]|[-]|[Description=Restart the cluster]
[PASSED]|[VerticalScaling]|[ComponentName=fe]|[Description=VerticalScaling the cluster specify component fe]
[PASSED]|[Restart]|[ComponentName=fe]|[Description=Restart the cluster specify component fe]
[PASSED]|[RebuildInstance]|[ComponentName=fe]|[Description=Rebuild the cluster instance specify component fe]
[PASSED]|[Restart]|[ComponentName=be]|[Description=Restart the cluster specify component be]
[PASSED]|[VolumeExpansion]|[ComponentName=be]|[Description=VolumeExpansion the cluster specify component be]
[PASSED]|[VerticalScaling]|[ComponentName=be]|[Description=VerticalScaling the cluster specify component be]
[PASSED]|[Failover]|[HA=Connection Stress;ComponentName=fe]|[Description=Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.]
[PASSED]|[Update]|[TerminationPolicy=WipeOut]|[Description=Update the cluster TerminationPolicy WipeOut]
[PASSED]|[Delete]|[-]|[Description=Delete the cluster]
[END]

JashBook avatar May 27 '25 07:05 JashBook