kubeblocks
kubeblocks copied to clipboard
[BUG]start starrocks cluster failed after stopping it
Describe the bug
Kubernetes: v1.31.1-aliyun.1
KubeBlocks: 1.0.0-beta.32
kbcli: 1.0.0-beta.15
To Reproduce Steps to reproduce the behavior:
- Create starrocks cluster with yaml below - running
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: strsce-yioztk
namespace: default
spec:
clusterDef: starrocks-ce
topology: shared-nothing
terminationPolicy: DoNotTerminate
componentSpecs:
- name: fe
serviceVersion: 3.2.2
disableExporter: true
replicas: 2
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 1000m
memory: 1Gi
volumeClaimTemplates:
- name: data
spec:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
- name: be
serviceVersion: 3.2.2
replicas: 2
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 1000m
memory: 1Gi
volumeClaimTemplates:
- name: data
spec:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
- Stop it
kbcli cluster list-instances strsce-yioztk --namespace default
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
strsce-yioztk-be-0 default strsce-yioztk be Running <none> cn-zhangjiakou-c 1 / 1 1Gi / 1Gi data:20Gi cn-zhangjiakou.10.0.0.144/10.0.0.144 Mar 07,2025 15:39 UTC+0800
strsce-yioztk-be-1 default strsce-yioztk be Running <none> cn-zhangjiakou-c 1 / 1 1Gi / 1Gi data:20Gi cn-zhangjiakou.10.0.0.145/10.0.0.145 Mar 07,2025 15:40 UTC+0800
strsce-yioztk-fe-0 default strsce-yioztk fe Running <none> cn-zhangjiakou-c 1 / 1 1Gi / 1Gi data:20Gi cn-zhangjiakou.10.0.0.144/10.0.0.144 Mar 07,2025 15:37 UTC+0800
strsce-yioztk-fe-1 default strsce-yioztk fe Running <none> cn-zhangjiakou-c 1 / 1 1Gi / 1Gi data:20Gi cn-zhangjiakou.10.0.0.144/10.0.0.144 Mar 07,2025 15:38 UTC+0800
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster stop strsce-yioztk --auto-approve --force=true --namespace default
OpsRequest strsce-yioztk-stop-vrxr2 created successfully, you can view the progress:
kbcli cluster describe-ops strsce-yioztk-stop-vrxr2 -n default
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all --namespace default
NAME NAMESPACE TYPE CLUSTER COMPONENT STATUS PROGRESS CREATED-TIME
strsce-yioztk-stop-vrxr2 default Stop strsce-yioztk be,fe Running 2/4 Mar 07,2025 16:45 UTC+0800
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % k get cluster | grep str
strsce-yioztk starrocks-ce DoNotTerminate Stopping 68m
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % k get cluster | grep str
strsce-yioztk starrocks-ce DoNotTerminate Stopped 68m
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all --namespace default
NAME NAMESPACE TYPE CLUSTER COMPONENT STATUS PROGRESS CREATED-TIME
strsce-yioztk-stop-vrxr2 default Stop strsce-yioztk be,fe Succeed 4/4 Mar 07,2025 16:45 UTC+0800
- Start it
kbcli cluster start strsce-yioztk --force=true --namespace default
OpsRequest strsce-yioztk-start-b6h92 created successfully, you can view the progress:
kbcli cluster describe-ops strsce-yioztk-start-b6h92 -n default
tianyue@apeclouds-MacBook-Pro kubeblocks-addons % kbcli cluster list-ops strsce-yioztk --status all --namespace default
NAME NAMESPACE TYPE CLUSTER COMPONENT STATUS PROGRESS CREATED-TIME
strsce-yioztk-stop-vrxr2 default Stop strsce-yioztk be,fe Succeed 4/4 Mar 07,2025 16:45 UTC+0800
strsce-yioztk-start-b6h92 default Start strsce-yioztk be,fe Running 0/4 Mar 07,2025 16:46 UTC+0800
- check the cluster status
get pod|grep str
strsce-yioztk-be-0 0/1 CrashLoopBackOff 11 (2m27s ago) 39m
strsce-yioztk-fe-0 0/1 ContainerCreating 0 39m
k describe pod strsce-yioztk-be-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned default/strsce-yioztk-be-0 to cn-zhangjiakou.10.0.0.144
Normal SuccessfulAttachVolume 39m attachdetach-controller AttachVolume.Attach succeeded for volume "d-8vb23m26wqssi0fnw5jx"
Normal AllocIPSucceed 39m terway-daemon Alloc IP 10.0.0.116/24 took 33.567815ms
Normal Pulled 38m (x2 over 39m) kubelet Container image "apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/be-ubuntu:3.2.2" already present on machine
Normal Created 38m (x2 over 39m) kubelet Created container be
Normal Started 38m (x2 over 39m) kubelet Started container be
Warning Unhealthy 9m17s (x127 over 39m) kubelet Startup probe failed: Get "http://10.0.0.116:8040/api/health": dial tcp 10.0.0.116:8040: connect: connection refused
Warning BackOff 4m11s (x125 over 37m) kubelet Back-off restarting failed container be in pod strsce-yioztk-be-0_default(b6583351-eca9-491b-b8f8-eefe9dc04ad7)
- see error
[Fri Mar 7 17:22:39 CST 2025] /etc/starrocks/be/conf not exist or not a directory, ignore ...
[Fri Mar 7 17:22:39 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar 7 17:22:41 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar 7 17:22:43 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar 7 17:22:45 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar 7 17:22:47 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsce-yioztk-fe-fe:9030' (111)
[Fri Mar 7 17:22:49 CST 2025] Add myself (strsce-yioztk-be-0.strsce-yioztk-be-headless.default.svc.cluster.local:9050) into FE ...
A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context Add any other context about the problem here.
This issue has been marked as stale because it has been open for 30 days with no activity
run passed on vke
--------------------------------------Starrocks CE (Topology = shared-nothing Replicas 2) Test Result--------------------------------------
[PASSED]|[Create]|[ComponentDefinition=starrocks-ce-be-1.0.0-alpha.0;ComponentVersion=starrocks-ce-be-1.0.0-alpha.0;ServiceVersion=3.2.2;]|[Description=Create a cluster with the specified component definition starrocks-ce-be-1.0.0-alpha.0 and component version starrocks-ce-be-1.0.0-alpha.0 and service version 3.2.2]
[PASSED]|[Connect]|[ComponentName=fe]|[Description=Connect to the cluster]
[PASSED]|[Stop]|[-]|[Description=Stop the cluster]
[PASSED]|[Start]|[-]|[Description=Start the cluster]
[PASSED]|[HorizontalScaling Out]|[ComponentName=be]|[Description=HorizontalScaling Out the cluster specify component be]
[PASSED]|[HorizontalScaling In]|[ComponentName=be]|[Description=HorizontalScaling In the cluster specify component be]
[PASSED]|[Restart]|[-]|[Description=Restart the cluster]
[PASSED]|[VerticalScaling]|[ComponentName=fe]|[Description=VerticalScaling the cluster specify component fe]
[PASSED]|[Restart]|[ComponentName=fe]|[Description=Restart the cluster specify component fe]
[PASSED]|[RebuildInstance]|[ComponentName=fe]|[Description=Rebuild the cluster instance specify component fe]
[PASSED]|[Restart]|[ComponentName=be]|[Description=Restart the cluster specify component be]
[PASSED]|[VolumeExpansion]|[ComponentName=be]|[Description=VolumeExpansion the cluster specify component be]
[PASSED]|[VerticalScaling]|[ComponentName=be]|[Description=VerticalScaling the cluster specify component be]
[PASSED]|[Failover]|[HA=Connection Stress;ComponentName=fe]|[Description=Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.]
[PASSED]|[Update]|[TerminationPolicy=WipeOut]|[Description=Update the cluster TerminationPolicy WipeOut]
[PASSED]|[Delete]|[-]|[Description=Delete the cluster]
[END]