kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] starrocks cluster restart fe pod status always ContainerCreating

Open JashBook opened this issue 10 months ago • 2 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: strsent-blmdvx
  namespace: default
spec:
  terminationPolicy: WipeOut
  componentSpecs:
    - name: cn
      componentDef: starrocks-cn
      replicas: 2
      resources:
        requests:
          cpu: 200m
          memory: 1Gi
        limits:
          cpu: 200m
          memory: 1Gi
    - name: fe
      componentDef: starrocks-fe-sd
      replicas: 2
      resources:
        requests:
          cpu: 200m
          memory: 1Gi
        limits:
          cpu: 200m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. restart fe
kbcli cluster restart strsent-blmdvx --auto-approve --components fe
  1. See error
kubectl get cluster 
NAME             CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
strsent-blmdvx                                  WipeOut              Updating   14m

kubectl get pod 
NAME                                              READY   STATUS              RESTARTS      AGE
strsent-blmdvx-cn-0                               1/1     Running             2 (12m ago)   14m
strsent-blmdvx-cn-1                               1/1     Running             2 (12m ago)   14m
strsent-blmdvx-fe-0                               0/1     ContainerCreating   0             5m37s
strsent-blmdvx-fe-1                               1/1     Running             1 (12m ago)   14m

➜  ~ kubectl get ops 
NAME                                     TYPE                CLUSTER          STATUS    PROGRESS   AGE
strsent-blmdvx-restart-4vjjv             Restart             strsent-blmdvx   Running   0/2        5m45s

describe cluster

kubectl describe cluster strsent-blmdvx
Name:         strsent-blmdvx
Namespace:    default
Labels:       app.kubernetes.io/instance=strsent-blmdvx
Annotations:  kubeblocks.io/ops-request: [{"name":"strsent-blmdvx-restart-4vjjv","type":"Restart"}]
              kubeblocks.io/reconcile: 2024-04-08T11:32:33.513823465Z
API Version:  apps.kubeblocks.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2024-04-08T11:30:29Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:  6
  Managed Fields:
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2024-04-08T11:30:29Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubeblocks.io/ops-request:
          f:kubeblocks.io/reconcile:
        f:finalizers:
          .:
          v:"cluster.kubeblocks.io/finalizer":
      f:spec:
        f:componentSpecs:
        f:monitor:
        f:resources:
          .:
          f:cpu:
          f:memory:
        f:services:
        f:storage:
          .:
          f:size:
    Manager:      manager
    Operation:    Update
    Time:         2024-04-08T11:39:40Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/instance:
      f:spec:
        f:terminationPolicy:
    Manager:      kbcli
    Operation:    Update
    Time:         2024-04-08T11:44:27Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:components:
          .:
          f:cn:
            .:
            f:message:
              .:
              f:Pod/strsent-blmdvx-cn-0:
              f:Pod/strsent-blmdvx-cn-1:
            f:phase:
            f:podsReady:
            f:podsReadyTime:
          f:fe:
            .:
            f:message:
              .:
              f:Pod/strsent-blmdvx-fe-1:
            f:phase:
            f:podsReady:
            f:podsReadyTime:
        f:conditions:
        f:observedGeneration:
        f:phase:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2024-04-08T11:44:27Z
  Resource Version:  316951856
  UID:               a859091c-ef2b-4cfd-9fd7-7d44d6b222a2
Spec:
  Component Specs:
    Component Def:  starrocks-cn
    Monitor:        false
    Name:           cn
    Replicas:       2
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
      Requests:
        Cpu:          200m
        Memory:       1Gi
    Service Version:  3.2.2
    Component Def:    starrocks-fe-sd
    Monitor:          false
    Name:             fe
    Replicas:         2
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
      Requests:
        Cpu:          200m
        Memory:       1Gi
    Service Version:  3.2.2
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  20Gi
  Monitor:
  Resources:
    Cpu:     0
    Memory:  0
  Services:
    Annotations:
      networking.gke.io/load-balancer-type:  Internal
    Component Selector:                      fe
    Name:                                    fe-vpc
    Service Name:                            fe-vpc
    Spec:
      Ports:
        Name:         fe-http
        Node Port:    30217
        Port:         8030
        Protocol:     TCP
        Target Port:  http-port
        Name:         fe-mysql
        Node Port:    32740
        Port:         9030
        Protocol:     TCP
        Target Port:  query-port
      Type:           LoadBalancer
  Storage:
    Size:              0
  Termination Policy:  WipeOut
Status:
  Components:
    Cn:
      Message:
        Pod/strsent-blmdvx-cn-0:  
        Pod/strsent-blmdvx-cn-1:  
      Phase:                      Running
      Pods Ready:                 true
      Pods Ready Time:            2024-04-08T11:44:27Z
    Fe:
      Message:
        Pod/strsent-blmdvx-fe-1:  
      Phase:                      Updating
      Pods Ready:                 false
      Pods Ready Time:            2024-04-08T11:38:10Z
  Conditions:
    Last Transition Time:  2024-04-08T11:30:29Z
    Message:               The operator has started the provisioning of Cluster: strsent-blmdvx
    Observed Generation:   6
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2024-04-08T11:30:29Z
    Message:               Successfully applied for resources
    Observed Generation:   6
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2024-04-08T11:39:40Z
    Message:               pods are not ready in Components: [fe], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2024-04-08T11:39:40Z
    Message:               pods are unavailable in Components: [fe], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
  Observed Generation:     6
  Phase:                   Updating
Events:
  Type     Reason                    Age                From                Message
  ----     ------                    ----               ----                -------
  Normal   ComponentPhaseTransition  15m (x2 over 15m)  cluster-controller  component is Creating
  Warning  Unhealthy                 14m (x9 over 15m)  event-controller    Pod strsent-blmdvx-cn-0: Startup probe failed:   %!T(MISSING)otal    %!R(MISSING)eceived %!X(MISSING)ferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 10.128.1.105 port 8040 after 0 ms: Connection refused
  Warning  Unhealthy  14m (x6 over 14m)  event-controller  Pod strsent-blmdvx-fe-1: Startup probe failed:   %!T(MISSING)otal    %!R(MISSING)eceived %!X(MISSING)ferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 10.128.1.106 port 8030 after 0 ms: Connection refused
  Warning  Unhealthy  14m (x10 over 15m)  event-controller  Pod strsent-blmdvx-cn-1: Startup probe failed:   %!T(MISSING)otal    %!R(MISSING)eceived %!X(MISSING)ferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 10.128.1.104 port 8040 after 0 ms: Connection refused
  Normal   ComponentPhaseTransition  13m (x3 over 14m)    cluster-controller         component is Failed
  Warning  Failed                    13m                  cluster-controller         Cluster: strsent-blmdvx is Failed, check according to the components message
  Warning  Abnormal                  13m (x3 over 14m)    cluster-controller         Cluster: strsent-blmdvx is Abnormal, check according to the components message
  Normal   ComponentPhaseTransition  13m (x3 over 14m)    cluster-controller         component is Updating
  Warning  ReplicasNotReady          12m                  cluster-controller         pods are not ready in Components: [fe], refer to related component message in Cluster.status.components
  Warning  ComponentsNotReady        12m                  cluster-controller         pods are unavailable in Components: [fe], refer to related component message in Cluster.status.components
  Normal   ClusterReady              11m                  cluster-controller         Cluster: strsent-blmdvx is ready, current phase is Running
  Normal   ComponentPhaseTransition  11m (x2 over 12m)    cluster-controller         component is Running
  Normal   Running                   11m                  cluster-controller         Cluster: strsent-blmdvx is ready, current phase is Running
  Normal   AllReplicasReady          11m                  cluster-controller         all pods of components are ready, waiting for the probe detection successful
  Warning  NotFound                  8m7s (x25 over 11m)  system-account-controller  ClusterDefinition.apps.kubeblocks.io "" not found
  Normal   ApplyResourcesSucceed     8m7s (x3 over 15m)   cluster-controller         Successfully applied for resources
  Normal   PreCheckSucceed           109s (x6 over 15m)   cluster-controller         The operator has started the provisioning of Cluster: strsent-blmdvx

describe pod

kubectl describe pod strsent-blmdvx-fe-0
Name:           strsent-blmdvx-fe-0
Namespace:      default
Priority:       0
Node:           gke-infracreate-gke-kbdata-e2-standar-25c8fd47-ovfq/10.10.0.36
Start Time:     Mon, 08 Apr 2024 19:39:43 +0800
Labels:         app.kubernetes.io/component=starrocks-fe-sd
                app.kubernetes.io/instance=strsent-blmdvx
                app.kubernetes.io/managed-by=kubeblocks
                app.kubernetes.io/name=starrocks-fe-sd
                app.kubernetes.io/version=starrocks-fe-sd
                apps.kubeblocks.io/cluster-uid=a859091c-ef2b-4cfd-9fd7-7d44d6b222a2
                apps.kubeblocks.io/component-name=fe
                apps.kubeblocks.io/service-version=3.2.2
                componentdefinition.kubeblocks.io/name=starrocks-fe-sd
                controller-revision-hash=8d4dfcf7b
Annotations:    apps.kubeblocks.io/component-replicas: 2
                kubeblocks.io/restart: 2024-04-08T11:39:40Z
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicatedStateMachine/strsent-blmdvx-fe
Containers:
  fe:
    Container ID:  
    Image:         docker.io/starrocks/fe-ubuntu:3.2.2
    Image ID:      
    Ports:         8030/TCP, 9020/TCP, 9030/TCP, 9010/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -c
      # FIXME temporary workaround that will be removed in the future when the FE supports the IPv6
              POD_IP_V4=
      ips=$(echo $KB_POD_IPS | tr "," "\n")
      for ip in $ips; do
          if [[ $ip == *":"* ]]; then
              continue
          fi
          POD_IP_V4=$ip
          break
      done
      if [[ -z $POD_IP_V4 ]]; then
          echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
          exit 1
      fi
      HOST_TYPE=IP POD_IP=${POD_IP_V4} /opt/starrocks/fe_entrypoint.sh ${FE_DISCOVERY_SERVICE_NAME}
      
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  1Gi
    Requests:
      cpu:     200m
      memory:  1Gi
    Liveness:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=3
    Readiness:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=3
    Startup:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=60
    Environment Variables from:
      strsent-blmdvx-fe-env      ConfigMap  Optional: false
      strsent-blmdvx-fe-rsm-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:        <set to the key 'username' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:    <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      MYSQL_PWD:             <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      KB_POD_NAME:           strsent-blmdvx-fe-0 (v1:metadata.name)
      KB_POD_UID:             (v1:metadata.uid)
      KB_NAMESPACE:          default (v1:metadata.namespace)
      KB_SA_NAME:             (v1:spec.serviceAccountName)
      KB_NODENAME:            (v1:spec.nodeName)
      KB_HOST_IP:             (v1:status.hostIP)
      KB_POD_IP:              (v1:status.podIP)
      KB_POD_IPS:             (v1:status.podIPs)
      KB_HOSTIP:              (v1:status.hostIP)
      KB_PODIP:               (v1:status.podIP)
      KB_PODIPS:              (v1:status.podIPs)
      KB_POD_FQDN:           $(KB_POD_NAME).strsent-blmdvx-fe-headless.$(KB_NAMESPACE).svc
      TZ:                    Asia/Shanghai
      POD_NAME:              strsent-blmdvx-fe-0 (v1:metadata.name)
      POD_IP:                 (v1:status.podIP)
      HOST_IP:                (v1:status.hostIP)
      POD_NAMESPACE:         default (v1:metadata.namespace)
      HOST_TYPE:             FQDN
      COMPONENT_NAME:        fe
      CONFIGMAP_MOUNT_PATH:  /etc/starrocks/fe/conf
      SERVICE_PORT:          8030
    Mounts:
      /opt/starrocks/fe/conf from fe-cm (rw)
      /opt/starrocks/fe/log from log (rw)
      /opt/starrocks/fe/meta from data (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9r8gj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  log:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  fe-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-blmdvx-fe-fe-cm
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-blmdvx-fe-scripts
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-strsent-blmdvx-fe-0
    ReadOnly:   false
  kube-api-access-9r8gj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  7m42s  default-scheduler  Successfully assigned default/strsent-blmdvx-fe-0 to gke-infracreate-gke-kbdata-e2-standar-25c8fd47-ovfq
  Normal  Pulled     7m38s  kubelet            Container image "docker.io/starrocks/fe-ubuntu:3.2.2" already present on machine
  Normal  Created    7m38s  kubelet            Created container fe
  Normal  Started    7m38s  kubelet            Started container fe

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Apr 08 '24 11:04 JashBook

The CPU resource is insufficient, please increase it to at least 1 core and retry.

iziang avatar Apr 27 '24 15:04 iziang

kubectl get pod

increase cpu to 1c restart fe component pod Error

➜  ~ kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
strsent-blmdvx-cn-0   2/2     Running   0          14m
strsent-blmdvx-cn-1   2/2     Running   0          14m
strsent-blmdvx-fe-0   1/2     Error     0          14m
strsent-blmdvx-fe-1   1/2     Running   0          4m11s
➜  ~ 
➜  ~ kbcli cluster list-instances strsent-blmdvx                    
NAME                  NAMESPACE   CLUSTER          COMPONENT   STATUS    ROLE     ACCESSMODE   AZ       CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                    CREATED-TIME                 
strsent-blmdvx-cn-0   default     strsent-blmdvx   cn          Running   <none>   <none>       <none>   1 / 1                1Gi / 1Gi               <none>      minikube/192.168.49.2   May 07,2024 14:41 UTC+0800   
strsent-blmdvx-cn-1   default     strsent-blmdvx   cn          Running   <none>   <none>       <none>   1 / 1                1Gi / 1Gi               <none>      minikube/192.168.49.2   May 07,2024 14:41 UTC+0800   
strsent-blmdvx-fe-0   default     strsent-blmdvx   fe          Running   <none>   <none>       <none>   1 / 1                1Gi / 1Gi               data:20Gi   minikube/192.168.49.2   May 07,2024 14:41 UTC+0800   
strsent-blmdvx-fe-1   default     strsent-blmdvx   fe          Running   <none>   <none>       <none>   1 / 1                1Gi / 1Gi               data:20Gi   minikube/192.168.49.2   May 07,2024 14:52 UTC+0800 

logs pod

kubectl logs strsent-blmdvx-fe-0 fe
[Tue May  7 14:41:59 CST 2024] /etc/starrocks/fe/conf not exist or not a directory, ignore ...
[Tue May  7 14:41:59 CST 2024] first start fe with meta not exist.
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:00 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:02 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:04 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:06 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:08 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:10 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:12 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:14 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:16 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:18 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:20 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:22 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:24 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:26 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:28 CST 2024] No leader yet, has_member: false ...
ERROR 2003 (HY000): Can't connect to MySQL server on 'strsent-blmdvx-fe-fe:9030' (111)
[Tue May  7 14:42:30 CST 2024] No leader yet, has_member: false ...
[Tue May  7 14:42:30 CST 2024] Timed out, no members detected ever, assume myself is the first node ..
[Tue May  7 14:42:30 CST 2024] first start with no meta run start_fe.sh with additional options: ' --host_type IP'
kubectl logs strsent-blmdvx-fe-1 fe
[Tue May  7 14:57:44 CST 2024] /etc/starrocks/fe/conf not exist or not a directory, ignore ...
[Tue May  7 14:57:44 CST 2024] start fe with exist meta.
[Tue May  7 14:57:44 CST 2024] start with meta run start_fe.sh with additional options: ' --host_type IP'

describe pod

kubectl describe pod strsent-blmdvx-fe-1
Name:         strsent-blmdvx-fe-1
Namespace:    default
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Tue, 07 May 2024 14:52:36 +0800
Labels:       app.kubernetes.io/component=starrocks-fe-sd
              app.kubernetes.io/instance=strsent-blmdvx
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=starrocks-fe-sd
              app.kubernetes.io/version=starrocks-fe-sd
              apps.kubeblocks.io/cluster-uid=231d9573-3125-49a0-82eb-21b031492685
              apps.kubeblocks.io/component-name=fe
              apps.kubeblocks.io/pod-name=strsent-blmdvx-fe-1
              componentdefinition.kubeblocks.io/name=starrocks-fe-sd
              controller-revision-hash=7987578947
              workloads.kubeblocks.io/instance=strsent-blmdvx-fe
              workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:  apps.kubeblocks.io/component-replicas: 2
              kubeblocks.io/restart: 2024-05-07T06:52:34Z
Status:       Running
IP:           10.244.4.152
IPs:
  IP:           10.244.4.152
Controlled By:  InstanceSet/strsent-blmdvx-fe
Init Containers:
  starrocks-tools:
    Container ID:  docker://f25ee7cf366fe04e37571b3de22d7a06e526edc8f5825e858c6f660982217873
    Image:         docker.io/apecloud/starrocks-tools:3.2.2
    Image ID:      docker-pullable://apecloud/starrocks-tools@sha256:fd9b4e989932b172368cdd1de986845ea96c0d5c19efd4c7fe3bea11bd7aa0f5
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /bin/mysql
      /kb_tools/mysql
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 07 May 2024 14:52:44 +0800
      Finished:     Tue, 07 May 2024 14:52:44 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      strsent-blmdvx-fe-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:      <set to the key 'username' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:  <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      MYSQL_PWD:           <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      KB_POD_NAME:         strsent-blmdvx-fe-1 (v1:metadata.name)
      KB_POD_UID:           (v1:metadata.uid)
      KB_NAMESPACE:        default (v1:metadata.namespace)
      KB_SA_NAME:           (v1:spec.serviceAccountName)
      KB_NODENAME:          (v1:spec.nodeName)
      KB_HOST_IP:           (v1:status.hostIP)
      KB_POD_IP:            (v1:status.podIP)
      KB_POD_IPS:           (v1:status.podIPs)
      KB_HOSTIP:            (v1:status.hostIP)
      KB_PODIP:             (v1:status.podIP)
      KB_PODIPS:            (v1:status.podIPs)
      KB_POD_FQDN:         $(KB_POD_NAME).strsent-blmdvx-fe-headless.$(KB_NAMESPACE).svc
      TOOLS_SCRIPTS_PATH:  /opt/kb-tools/reload/fe-cm
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/config-manager from config-manager-config (rw)
      /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v5qp4 (ro)
Containers:
  fe:
    Container ID:  docker://311820a1d144f843bfd7f0579b3919c2c079e50efe1c9f990f32702a9d1ee5dc
    Image:         docker.io/starrocks/fe-ubuntu:3.2.2
    Image ID:      docker-pullable://starrocks/fe-ubuntu@sha256:6446acb1a16ce103476b17c3844e9f7e12cd09ac188cfe8ff01aad56ca87e612
    Ports:         8030/TCP, 9020/TCP, 9030/TCP, 9010/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -c
      # FIXME temporary workaround that will be removed in the future when the FE supports the IPv6
              POD_IP_V4=
      ips=$(echo $KB_POD_IPS | tr "," "\n")
      for ip in $ips; do
          if [[ $ip == *":"* ]]; then
              continue
          fi
          POD_IP_V4=$ip
          break
      done
      if [[ -z $POD_IP_V4 ]]; then
          echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
          exit 1
      fi
      HOST_TYPE=IP POD_IP=${POD_IP_V4} /opt/starrocks/fe_entrypoint.sh ${FE_DISCOVERY_SERVICE_NAME}
      
    State:          Running
      Started:      Tue, 07 May 2024 14:57:44 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Tue, 07 May 2024 14:52:44 +0800
      Finished:     Tue, 07 May 2024 14:57:44 +0800
    Ready:          False
    Restart Count:  1
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Liveness:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=3
    Readiness:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=3
    Startup:  exec [/bin/bash -c POD_IP_V4=
ips=$(echo $KB_POD_IPS | tr "," "\n")
for ip in $ips; do
    if [[ $ip == *":"* ]]; then
        continue
    fi
    POD_IP_V4=$ip
    break
done
if [[ -z $POD_IP_V4 ]]; then
    echo "Failed to get IPv4 POD_IP from KB_POD_IPS"
    exit 1
fi
curl --fail http://$POD_IP_V4:8030/api/health
] delay=0s timeout=1s period=5s #success=1 #failure=60
    Environment Variables from:
      strsent-blmdvx-fe-env      ConfigMap  Optional: false
      strsent-blmdvx-fe-its-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:        <set to the key 'username' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:    <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      MYSQL_PWD:             <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      KB_POD_NAME:           strsent-blmdvx-fe-1 (v1:metadata.name)
      KB_POD_UID:             (v1:metadata.uid)
      KB_NAMESPACE:          default (v1:metadata.namespace)
      KB_SA_NAME:             (v1:spec.serviceAccountName)
      KB_NODENAME:            (v1:spec.nodeName)
      KB_HOST_IP:             (v1:status.hostIP)
      KB_POD_IP:              (v1:status.podIP)
      KB_POD_IPS:             (v1:status.podIPs)
      KB_HOSTIP:              (v1:status.hostIP)
      KB_PODIP:               (v1:status.podIP)
      KB_PODIPS:              (v1:status.podIPs)
      KB_POD_FQDN:           $(KB_POD_NAME).strsent-blmdvx-fe-headless.$(KB_NAMESPACE).svc
      TZ:                    Asia/Shanghai
      POD_NAME:              strsent-blmdvx-fe-1 (v1:metadata.name)
      POD_IP:                 (v1:status.podIP)
      HOST_IP:                (v1:status.hostIP)
      POD_NAMESPACE:         default (v1:metadata.namespace)
      HOST_TYPE:             FQDN
      COMPONENT_NAME:        fe
      CONFIGMAP_MOUNT_PATH:  /etc/starrocks/fe/conf
      SERVICE_PORT:          8030
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /opt/starrocks/fe/log from log (rw)
      /opt/starrocks/fe/meta from data (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v5qp4 (ro)
  config-manager:
    Container ID:  docker://2e9631debd2c56a85d57dda0109c31264178ce90687d6d124f60575e5f19be00
    Image:         docker.io/apecloud/kubeblocks-tools:0.9.0-beta.18
    Image ID:      docker-pullable://apecloud/kubeblocks-tools@sha256:24b7a15e6391c331b506a04c7653cd75ed5c2423e4d7a5dbefc7b52b67210d2a
    Port:          <none>
    Host Port:     <none>
    Command:
      env
    Args:
      PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$(TOOLS_PATH)
      /bin/reloader
      --log-level
      info
      --operator-update-enable
      --tcp
      9901
      --config
      /opt/config-manager/config-manager.yaml
    State:          Running
      Started:      Tue, 07 May 2024 14:52:44 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      strsent-blmdvx-fe-env      ConfigMap  Optional: false
      strsent-blmdvx-fe-its-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:         <set to the key 'username' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:     <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      MYSQL_PWD:              <set to the key 'password' in secret 'strsent-blmdvx-fe-account-root'>  Optional: false
      KB_POD_NAME:            strsent-blmdvx-fe-1 (v1:metadata.name)
      KB_POD_UID:              (v1:metadata.uid)
      KB_NAMESPACE:           default (v1:metadata.namespace)
      KB_SA_NAME:              (v1:spec.serviceAccountName)
      KB_NODENAME:             (v1:spec.nodeName)
      KB_HOST_IP:              (v1:status.hostIP)
      KB_POD_IP:               (v1:status.podIP)
      KB_POD_IPS:              (v1:status.podIPs)
      KB_HOSTIP:               (v1:status.hostIP)
      KB_PODIP:                (v1:status.podIP)
      KB_PODIPS:               (v1:status.podIPs)
      KB_POD_FQDN:            $(KB_POD_NAME).strsent-blmdvx-fe-headless.$(KB_NAMESPACE).svc
      CONFIG_MANAGER_POD_IP:   (v1:status.podIP)
      TOOLS_PATH:             /opt/kb-tools/reload/fe-cm:/opt/config-manager:/kb_tools
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/config-manager from config-manager-config (rw)
      /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v5qp4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  log:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  fe-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-blmdvx-fe-fe-cm
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-blmdvx-fe-scripts
    Optional:  false
  cm-script-fe-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sidecar-starrocks-scripts-strsent-blmdvx
    Optional:  false
  config-manager-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sidecar-strsent-blmdvx-fe-config-manager-config
    Optional:  false
  kb-tools:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-strsent-blmdvx-fe-1
    ReadOnly:   false
  kube-api-access-v5qp4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m36s                  default-scheduler  Successfully assigned default/strsent-blmdvx-fe-1 to minikube
  Normal   Pulled     7m29s                  kubelet            Container image "docker.io/apecloud/starrocks-tools:3.2.2" already present on machine
  Normal   Created    7m29s                  kubelet            Created container starrocks-tools
  Normal   Started    7m28s                  kubelet            Started container starrocks-tools
  Normal   Pulled     7m28s                  kubelet            Container image "docker.io/starrocks/fe-ubuntu:3.2.2" already present on machine
  Normal   Created    7m28s                  kubelet            Created container fe
  Normal   Started    7m28s                  kubelet            Started container fe
  Normal   Pulled     7m28s                  kubelet            Container image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.18" already present on machine
  Normal   Created    7m28s                  kubelet            Created container config-manager
  Normal   Started    7m28s                  kubelet            Started container config-manager
  Warning  Unhealthy  6m9s (x16 over 7m24s)  kubelet            Startup probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 10.244.4.152 port 8030 after 0 ms: Connection refused
  Warning  FailedPreStopHook  2m28s  kubelet  PreStopHook failed

JashBook avatar May 07 '24 07:05 JashBook

To ensure functionality in an IPv6 environment, StarRocks FE has switched to using IP addresses as unique identifiers. However, when a Pod is rebuilt, its IP changes, making the old IPs inaccessible. Consequently, FE cannot reach consensus, and the cluster fails to start.

The solution is to add an ipFamily option in the values.yaml file, with values of either IPv4 or IPv6, indicating the primary protocol stack in the environment. If it is IPv4, the pod headless service domain name is used as the unique identifier, ensuring stability. If it is IPv6, since the StarRocks kernel does not yet support IPv6, we need to adapt the readinessProbe and livenessProbe methods to use IPv4, ensuring the cluster can be properly launched. However, a known issue is that the IP changes after Pod reconstruction, requiring manual removal of old nodes and addition of new ones.

iziang avatar Jun 02 '24 15:06 iziang