kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG]halo created failed for checkrole failed

Open ahjing99 opened this issue 1 year ago • 1 comments

➜ ~ kbcli version Kubernetes: v1.27.8-gke.1067004 KubeBlocks: 0.9.0-beta.4 kbcli: 0.9.0-beta.1

  1. Create halo
      `helm repo add kubeblocks-addons  https://jihulab.com/api/v4/projects/150246/packages/helm/stable`

"kubeblocks-addons" has been added to your repositories

      `helm repo update kubeblocks-addons `

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "kubeblocks-addons" chart repository
Update Complete. ⎈Happy Helming!⎈

      `helm upgrade --install halo kubeblocks-addons/halo --version 0.2.0 `

Release "halo" does not exist. Installing it now.
W0409 18:33:17.365955   34904 warnings.go:70] The ClusterVersion CRD has been deprecated since 0.9.0
W0409 18:33:21.418646   34904 warnings.go:70] The ClusterVersion CRD has been deprecated since 0.9.0
NAME: halo
LAST DEPLOYED: Tue Apr  9 18:33:12 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Halo with HA  on KubeBlocks
How to start Halo :
first of all you should git clone kubeblocks-addons and cd to file `addons`
1) helm install etcd with Official etcd of kubeblocks
```helm install etcd ./etcd```
and then helm install etcd-cluster for halo-cluster
```helm install etcd-cluster ./etcd-cluster```
2) helm install halo with file halo
if your etcd service is named like first strep, p.s etcd-cluster, now you can helm install halo , if not so , you can edit the file clusterdefinition.yaml ,in file ,which have env named "PATRONI_ETCD3_HOST" .It should change with your own etcd service .
if your etcd service also in k8s and in namespace default , then you can use like file clusterdefinition.yaml。For example in clusterdefinition.yaml the PATRONI_ETCD3_HOST is etcd-cluster-etcd:2379, because through helm we install etcd-cluster and after installed , the kubeblocks automatically create SVC etcd-cluster-etcd. And we create etcd-cluster and halo-cluster in the same namespace named default.

```helm install halo ./halo```
3) helm install halo-cluster with file halo-cluster
```helm install halo-cluster ./halo-cluster```

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: halo-ojtjck
  namespace: default
spec:
  clusterDefinitionRef: halo
  clusterVersionRef: halo-14.10
  terminationPolicy: WipeOut
  componentSpecs:
    - name: halo
      componentDefRef: halo
      replicas: 2
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      switchPolicy:
        type: Noop
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi

      `kubectl apply -f test_create_halo-ojtjck.yaml`
  1. Pods keep crashing
➜  ~ k get pod
NAME                            READY   STATUS    RESTARTS      AGE
halo-ojtjck-halo-0              2/2     Running   6 (85s ago)   12m
halo-ojtjck-halo-1              2/2     Running   6 (85s ago)   12m

➜  ~ k logs halo-ojtjck-halo-0
Defaulted container "halo" out of: halo, lorry, init-lorry (init)
➜  ~ k logs halo-ojtjck-halo-0 -c lorry
2024-04-09T10:34:54Z	INFO	Initialize DB manager
2024-04-09T10:34:54Z	INFO	KB_WORKLOAD_TYPE ENV not set
2024-04-09T10:34:54Z	INFO	Volume-Protection	succeed to init volume protection	{"pod": "halo-ojtjck-halo-0", "spec": {"highWatermark":"0","volumes":[]}}
2024-04-09T10:34:54Z	INFO	HTTPServer	Starting HTTP Server
2024-04-09T10:34:54Z	INFO	HTTPServer	API route path	{"method": "GET", "path": ["/v1.0/query", "/v1.0/listusers", "/v1.0/listsystemaccounts", "/v1.0/checkrole", "/v1.0/getrole", "/v1.0/healthycheck", "/v1.0/describeuser"]}
2024-04-09T10:34:54Z	INFO	HTTPServer	API route path	{"method": "POST", "path": ["/v1.0/createuser", "/v1.0/revokeuserrole", "/v1.0/unlockinstance", "/v1.0/postprovision", "/v1.0/checkrunning", "/v1.0/datadump", "/v1.0/leavemember", "/v1.0/switchover", "/v1.0/preterminate", "/v1.0/dataload", "/v1.0/joinmember", "/v1.0/getlag", "/v1.0/rebuild", "/v1.0/deleteuser", "/v1.0/grantuserrole", "/v1.0/volumeprotection", "/v1.0/exec", "/v1.0/lockinstance"]}
2024-04-09T10:34:54Z	INFO	cronjobs	env is not set	{"env": "KB_CRON_JOBS"}
2024-04-09T10:34:56Z	INFO	DCS-K8S	pod selector: app.kubernetes.io/instance=halo-ojtjck,app.kubernetes.io/managed-by=kubeblocks,apps.kubeblocks.io/component-name=halo
2024-04-09T10:34:56Z	INFO	DCS-K8S	podlist: 2
2024-04-09T10:34:56Z	INFO	DCS-K8S	Leader configmap is not found	{"configmap": "halo-ojtjck-halo-leader"}
2024-04-09T10:34:56Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:34:56Z	INFO	checkrole	role checks failed continuously	{"times": 0}
2024-04-09T10:34:56Z	INFO	event	send event: map[operation:checkRole originalRole:]
2024-04-09T10:34:56Z	INFO	event	send event success	{"message": "{\"operation\":\"checkRole\",\"originalRole\":\"\"}"}
2024-04-09T10:35:06Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:35:16Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:35:26Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:35:36Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:35:46Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:35:56Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:06Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:16Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:26Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:36Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:46Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:36:56Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:37:06Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-04-09T10:37:16Z	INFO	checkrole	executing checkRole error	{"error": ""}

➜ ~ kbcli report cluster --with-logs --all-containers halo-ojtjck reporting cluster information to report-cluster-halo-ojtjck-2024-04-09-18-47-39.zip processing manifests OK processing events OK process pod logs OK ➜ ~ kbcli report kubeblocks --with-logs --all-containers --output yaml reporting KubeBlocks information to report-kubeblocks-2024-04-09-18-48-06.zip processing manifests OK processing events OK process pod logs OK report-kubeblocks-2024-04-09-18-48-06.zip report-cluster-halo-ojtjck-2024-04-09-18-47-39.zip

ahjing99 avatar Apr 09 '24 10:04 ahjing99

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar May 13 '24 00:05 github-actions[bot]

manual to follow: https://github.com/apecloud/kubeblocks-addons/pull/375

  1. make sure addon etcd is enabled
  2. create an etcd-cluster
  3. create a halo cluster

shanshanying avatar May 20 '24 05:05 shanshanying

kbcli version
Kubernetes: v1.27.13-gke.1000000
KubeBlocks: 0.9.0-beta.30
  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: halo-cluster
  namespace: default
  labels:
    helm.sh/chart: halo-cluster-0.2.0
    app.kubernetes.io/version: "0.2.0"
    app.kubernetes.io/instance: halo-cluster
spec:
  clusterVersionRef: halo-14.10
  terminationPolicy: Delete
  affinity:
    podAntiAffinity: Preferred
    topologyKeys:
      - kubernetes.io/hostname
    tenancy: SharedNode
  clusterDefinitionRef: halo
  componentSpecs:
    - name: halo
      componentDefRef: halo
      monitor: false
      replicas: 3
      serviceAccountName:
      switchPolicy:
        type: Noop
      resources:
        limits:
          cpu: "1"
          memory: "2Gi"
        requests:
          cpu: "1"
          memory: "2Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 50Gi
  1. see error
kubectl get pod
NAME                                              READY   STATUS    RESTARTS      AGE
halo-cluster-halo-0                               2/2     Running   1 (79s ago)   3m28s
halo-cluster-halo-1                               2/2     Running   1 (24s ago)   3m28s
halo-cluster-halo-2                               2/2     Running   1 (78s ago)   3m28s

kubectl get cluster 
NAME           CLUSTER-DEFINITION   VERSION      TERMINATION-POLICY   STATUS     AGE
halo-cluster   halo                 halo-14.10   Delete               Creating   3m44s

logs pod

➜  ~ kubectl logs halo-cluster-halo-0 halo
➜  ~ 
➜  ~ kubectl logs halo-cluster-halo-0 lorry
2024-06-06T06:16:53Z	INFO	Initialize DB manager
2024-06-06T06:16:53Z	INFO	KB_WORKLOAD_TYPE ENV not set
2024-06-06T06:16:53Z	INFO	Volume-Protection	succeed to init volume protection	{"pod": "halo-cluster-halo-0", "spec": {"highWatermark":"0","volumes":[]}}
2024-06-06T06:16:53Z	INFO	HTTPServer	Starting HTTP Server
2024-06-06T06:16:53Z	INFO	HTTPServer	API route path	{"method": "GET", "path": ["/v1.0/checkrole", "/v1.0/listusers", "/v1.0/query", "/v1.0/describeuser", "/v1.0/listsystemaccounts", "/v1.0/healthycheck", "/v1.0/getrole"]}
2024-06-06T06:16:53Z	INFO	HTTPServer	API route path	{"method": "POST", "path": ["/v1.0/createuser", "/v1.0/preterminate", "/v1.0/checkrunning", "/v1.0/getlag", "/v1.0/volumeprotection", "/v1.0/exec", "/v1.0/grantuserrole", "/v1.0/revokeuserrole", "/v1.0/unlockinstance", "/v1.0/postprovision", "/v1.0/dataload", "/v1.0/joinmember", "/v1.0/switchover", "/v1.0/deleteuser", "/v1.0/datadump", "/v1.0/leavemember", "/v1.0/rebuild", "/v1.0/lockinstance"]}
2024-06-06T06:16:53Z	INFO	cronjobs	env is not set	{"env": "KB_CRON_JOBS"}
2024-06-06T06:17:03Z	INFO	DCS-K8S	pod selector: app.kubernetes.io/instance=halo-cluster,app.kubernetes.io/managed-by=kubeblocks,apps.kubeblocks.io/component-name=halo
2024-06-06T06:17:03Z	INFO	DCS-K8S	podlist: 3
2024-06-06T06:17:04Z	INFO	DCS-K8S	Leader configmap is not found	{"configmap": "halo-cluster-halo-leader"}
2024-06-06T06:17:04Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:17:04Z	INFO	checkrole	role checks failed continuously	{"times": 0}
2024-06-06T06:17:04Z	INFO	event	send event: map[operation:checkRole originalRole:waitForStart]
2024-06-06T06:17:04Z	INFO	event	send event success	{"message": "{\"operation\":\"checkRole\",\"originalRole\":\"waitForStart\"}"}
2024-06-06T06:17:13Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:17:23Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:17:33Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:17:43Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:17:53Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:03Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:13Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:23Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:33Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:43Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:18:53Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:03Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:13Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:23Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:33Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:43Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:19:53Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:20:03Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:20:13Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:20:23Z	INFO	checkrole	executing checkRole error	{"error": ""}
2024-06-06T06:20:33Z	INFO	checkrole	executing checkRole error	{"error": ""}

describe pod

 kubectl describe pod halo-cluster-halo-0     

Name:         halo-cluster-halo-0
Namespace:    default
Priority:     0
Node:         gke-infracreate-gke-kbdata-e2-standar-25c8fd47-ens6/10.10.0.33
Start Time:   Thu, 06 Jun 2024 14:16:29 +0800
Labels:       app.kubernetes.io/component=halo
              app.kubernetes.io/instance=halo-cluster
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=halo
              app.kubernetes.io/version=
              apps.kubeblocks.halo.patroni/scope=halo-cluster-halo-patroni9efd316a
              apps.kubeblocks.io/cluster-uid=29d2f85d-df39-4b67-9768-54139efd316a
              apps.kubeblocks.io/component-name=halo
              apps.kubeblocks.io/pod-name=halo-cluster-halo-0
              clusterdefinition.kubeblocks.io/name=halo
              clusterversion.kubeblocks.io/name=halo-14.10
              controller-revision-hash=94555c49c
              helm.sh/chart=halo-cluster-0.2.0
              workloads.kubeblocks.io/instance=halo-cluster-halo
              workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:  apps.kubeblocks.io/component-replicas: 3
Status:       Running
IP:           10.128.4.38
IPs:
  IP:           10.128.4.38
Controlled By:  InstanceSet/halo-cluster-halo
Init Containers:
  init-lorry:
    Container ID:  containerd://4a4332fd5a898de2f3dd1a005138f0175cd32798d6e768a878853a0bae132c04
    Image:         docker.io/apecloud/kubeblocks-tools:0.9.0-beta.30
    Image ID:      docker.io/apecloud/kubeblocks-tools@sha256:d7812341b1d44fcafd1bbdc800d7153edc0ec2faa8c46bcc6a3caa2f419a9f85
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /bin/lorry
      /config
      /kubeblocks/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 06 Jun 2024 14:16:51 +0800
      Finished:     Thu, 06 Jun 2024 14:16:51 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      halo-cluster-halo-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:   halo-cluster-halo-0 (v1:metadata.name)
      KB_POD_UID:     (v1:metadata.uid)
      KB_NAMESPACE:  default (v1:metadata.namespace)
      KB_SA_NAME:     (v1:spec.serviceAccountName)
      KB_NODENAME:    (v1:spec.nodeName)
      KB_HOST_IP:     (v1:status.hostIP)
      KB_POD_IP:      (v1:status.podIP)
      KB_POD_IPS:     (v1:status.podIPs)
      KB_HOSTIP:      (v1:status.hostIP)
      KB_PODIP:       (v1:status.podIP)
      KB_PODIPS:      (v1:status.podIPs)
      KB_POD_FQDN:   $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4jdlm (ro)
Containers:
  halo:
    Container ID:  containerd://fefdd530a95ef75f6c6cfaae36063d6135dd9dae82e821cafe7f5bc2c864d2ab
    Image:         registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64
    Image ID:      registry.cn-hangzhou.aliyuncs.com/halocloud/halovector@sha256:fb632a1ad19bd1506e3fbaec537f205e24c540841cf02370ca73668e65e58ff7
    Port:          1921/TCP
    Host Port:     0/TCP
    Command:
      /halo-scripts/setup.sh
    State:          Running
      Started:      Thu, 06 Jun 2024 14:20:16 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 06 Jun 2024 14:18:34 +0800
      Finished:     Thu, 06 Jun 2024 14:20:16 +0800
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     1
      memory:  2Gi
    Liveness:  exec [/bin/bash -c gosu halo psql -U halo -d halo0root -c  "select 1"
] delay=65s timeout=5s period=3s #success=1 #failure=3
    Environment Variables from:
      halo-cluster-halo-env      ConfigMap  Optional: false
      halo-cluster-halo-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                         halo-cluster-halo-0 (v1:metadata.name)
      KB_POD_UID:                           (v1:metadata.uid)
      KB_NAMESPACE:                        default (v1:metadata.namespace)
      KB_SA_NAME:                           (v1:spec.serviceAccountName)
      KB_NODENAME:                          (v1:spec.nodeName)
      KB_HOST_IP:                           (v1:status.hostIP)
      KB_POD_IP:                            (v1:status.podIP)
      KB_POD_IPS:                           (v1:status.podIPs)
      KB_HOSTIP:                            (v1:status.hostIP)
      KB_PODIP:                             (v1:status.podIP)
      KB_PODIPS:                            (v1:status.podIPs)
      KB_POD_FQDN:                         $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
      PGDATA:                              /data/halo
      HALOPORT:                            1921
      ALLOW_NOSSL:                         true
      POD_NAME:                            halo-cluster-halo-0 (v1:metadata.name)
      PATRONI_KUBERNETES_POD_IP:            (v1:status.podIP)
      PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(KB_PODIP):1921
      POD_NAMESPACE:                       default (v1:metadata.namespace)
      PATRONI_SUPERUSER_USERNAME:          <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_SUPERUSER_PASSWORD:          <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_REPLICATION_USERNAME:        replica
      PATRONI_REPLICATION_PASSWORD:        <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_USER_REWIND:                 patroni
      PATRONI_PASSWORD_REWIND:             patroni
      PATRONI_RESTAPI_CONNECT_ADDRESS:     $(KB_PODIP):8008
      PATRONI_RESTAPI_LISTEN:              0.0.0.0:8008
      PATRONI_ETCD3_HOST:                  etcd-cluster-etcd:2379
      PATRONI_NAME:                        halo-cluster-halo-0 (v1:metadata.name)
      PATRONI_POSTGRESQL_LISTEN:           0.0.0.0:1921
      PATRONI_SCOPE:                       $(KB_CLUSTER_NAME)
      KUBERNETES_ROLE_LABEL:               apps.kubeblocks.halo.patroni/role
      KB_POD_FQDN:                         $KB_POD_FQDN
    Mounts:
      /data/halo from data (rw)
      /dev/shm from dshm (rw)
      /halo-scripts from scripts (rw)
      /var/lib/halo/conf from halo-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4jdlm (ro)
  lorry:
    Container ID:  containerd://d39ec24fa3ea170d451eaf8029ef219f1a723805253c4dfc0c08a431388dba63
    Image:         registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64
    Image ID:      registry.cn-hangzhou.aliyuncs.com/halocloud/halovector@sha256:fb632a1ad19bd1506e3fbaec537f205e24c540841cf02370ca73668e65e58ff7
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /kubeblocks/lorry
      --port
      3501
      --grpcport
      50001
      --config-path
      /kubeblocks/config/lorry/components/
    State:          Running
      Started:      Thu, 06 Jun 2024 14:16:53 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:      0
      memory:   0
    Readiness:  http-get http://:3501/v1.0/checkrole delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      halo-cluster-halo-env      ConfigMap  Optional: false
      halo-cluster-halo-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                         halo-cluster-halo-0 (v1:metadata.name)
      KB_POD_UID:                           (v1:metadata.uid)
      KB_NAMESPACE:                        default (v1:metadata.namespace)
      KB_SA_NAME:                           (v1:spec.serviceAccountName)
      KB_NODENAME:                          (v1:spec.nodeName)
      KB_HOST_IP:                           (v1:status.hostIP)
      KB_POD_IP:                            (v1:status.podIP)
      KB_POD_IPS:                           (v1:status.podIPs)
      KB_HOSTIP:                            (v1:status.hostIP)
      KB_PODIP:                             (v1:status.podIP)
      KB_PODIPS:                            (v1:status.podIPs)
      KB_POD_FQDN:                         $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
      KB_RSM_ROLE_PROBE_PERIOD:            0
      KB_BUILTIN_HANDLER:                  custom
      KB_SERVICE_USER:                     <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:                 <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      KB_SERVICE_PORT:                     1921
      KB_DATA_PATH:                        /data/halo
      KB_ACTION_COMMANDS:                  {"roleProbe":["Status=$(curl -s http://localhost:8008) \u0026\u0026 \nrole=$(echo $Status | jq .role  | tr -d '\"') \u0026\u0026\nif [ \"$role\" = \"master\" ]; then echo -n  \"primary\"; else echo -n  \"secondary\"; fi\n"]}
      PGDATA:                              /data/halo
      HALOPORT:                            1921
      ALLOW_NOSSL:                         true
      POD_NAME:                            halo-cluster-halo-0 (v1:metadata.name)
      PATRONI_KUBERNETES_POD_IP:            (v1:status.podIP)
      PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(KB_PODIP):1921
      POD_NAMESPACE:                       default (v1:metadata.namespace)
      PATRONI_SUPERUSER_USERNAME:          <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_SUPERUSER_PASSWORD:          <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_REPLICATION_USERNAME:        replica
      PATRONI_REPLICATION_PASSWORD:        <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_USER_REWIND:                 patroni
      PATRONI_PASSWORD_REWIND:             patroni
      PATRONI_RESTAPI_CONNECT_ADDRESS:     $(KB_PODIP):8008
      PATRONI_RESTAPI_LISTEN:              0.0.0.0:8008
      PATRONI_ETCD3_HOST:                  etcd-cluster-etcd:2379
      PATRONI_NAME:                        halo-cluster-halo-0 (v1:metadata.name)
      PATRONI_POSTGRESQL_LISTEN:           0.0.0.0:1921
      PATRONI_SCOPE:                       $(KB_CLUSTER_NAME)
      KUBERNETES_ROLE_LABEL:               apps.kubeblocks.halo.patroni/role
      KB_POD_FQDN:                         $KB_POD_FQDN
      KB_RSM_ACTION_SVC_LIST:              null
      KB_RSM_ROLE_UPDATE_MECHANISM:        DirectAPIServerEventUpdate
      KB_RSM_ROLE_PROBE_TIMEOUT:           1
      KB_CLUSTER_NAME:                      (v1:metadata.labels['app.kubernetes.io/instance'])
      KB_COMP_NAME:                         (v1:metadata.labels['apps.kubeblocks.io/component-name'])
      KB_SERVICE_CHARACTER_TYPE:           unknown
    Mounts:
      /data/halo from data (rw)
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4jdlm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  2Gi
  halo-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      halo-cluster-halo-halo-configuration
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      halo-cluster-halo-halo-scripts
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-halo-cluster-halo-0
    ReadOnly:   false
  kubeblocks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-4jdlm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               4m33s               default-scheduler        Successfully assigned default/halo-cluster-halo-0 to gke-infracreate-gke-kbdata-e2-standar-25c8fd47-ens6
  Normal   SuccessfulAttachVolume  4m22s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-85ff5a70-1f05-4655-b52b-588876829202"
  Normal   Pulled                  4m14s               kubelet                  Container image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.30" already present on machine
  Normal   Created                 4m11s               kubelet                  Created container init-lorry
  Normal   Started                 4m11s               kubelet                  Started container init-lorry
  Normal   Pulled                  4m9s                kubelet                  Container image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64" already present on machine
  Normal   Started                 4m9s                kubelet                  Started container lorry
  Normal   Created                 4m9s                kubelet                  Created container lorry
  Normal   checkRole               3m58s               lorry                    {"operation":"checkRole","originalRole":"waitForStart"}
  Warning  Unhealthy               76s (x6 over 3m4s)  kubelet                  Liveness probe failed: psql: error: connection to server on socket "/var/run/halo/.s.PGSQL.1921" failed: No such file or directory
           Is the server running locally and accepting connections on that socket?
  Normal   Killing  76s (x2 over 2m58s)  kubelet  Container halo failed liveness probe, will be restarted
  Normal   Created  46s (x3 over 4m10s)  kubelet  Created container halo
  Normal   Pulled   46s (x3 over 4m10s)  kubelet  Container image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64" already present on machine
  Normal   Started  46s (x3 over 4m9s)   kubelet  Started container halo
➜  ~ 
➜  ~ kubectl describe pod halo-cluster-halo-1 

Name:         halo-cluster-halo-1
Namespace:    default
Priority:     0
Node:         gke-infracreate-gke-default-pool-61c5a7f3-kan1/10.10.0.15
Start Time:   Thu, 06 Jun 2024 14:16:29 +0800
Labels:       app.kubernetes.io/component=halo
              app.kubernetes.io/instance=halo-cluster
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=halo
              app.kubernetes.io/version=
              apps.kubeblocks.halo.patroni/scope=halo-cluster-halo-patroni9efd316a
              apps.kubeblocks.io/cluster-uid=29d2f85d-df39-4b67-9768-54139efd316a
              apps.kubeblocks.io/component-name=halo
              apps.kubeblocks.io/pod-name=halo-cluster-halo-1
              clusterdefinition.kubeblocks.io/name=halo
              clusterversion.kubeblocks.io/name=halo-14.10
              controller-revision-hash=94555c49c
              helm.sh/chart=halo-cluster-0.2.0
              workloads.kubeblocks.io/instance=halo-cluster-halo
              workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:  apps.kubeblocks.io/component-replicas: 3
Status:       Running
IP:           10.128.2.125
IPs:
  IP:           10.128.2.125
Controlled By:  InstanceSet/halo-cluster-halo
Init Containers:
  init-lorry:
    Container ID:  containerd://e7a1a863780a7c535c05b987d1667e03119c2437dc2b56003f950b529ba96dae
    Image:         docker.io/apecloud/kubeblocks-tools:0.9.0-beta.30
    Image ID:      docker.io/apecloud/kubeblocks-tools@sha256:d7812341b1d44fcafd1bbdc800d7153edc0ec2faa8c46bcc6a3caa2f419a9f85
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /bin/lorry
      /config
      /kubeblocks/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 06 Jun 2024 14:16:44 +0800
      Finished:     Thu, 06 Jun 2024 14:16:45 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      halo-cluster-halo-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:   halo-cluster-halo-1 (v1:metadata.name)
      KB_POD_UID:     (v1:metadata.uid)
      KB_NAMESPACE:  default (v1:metadata.namespace)
      KB_SA_NAME:     (v1:spec.serviceAccountName)
      KB_NODENAME:    (v1:spec.nodeName)
      KB_HOST_IP:     (v1:status.hostIP)
      KB_POD_IP:      (v1:status.podIP)
      KB_POD_IPS:     (v1:status.podIPs)
      KB_HOSTIP:      (v1:status.hostIP)
      KB_PODIP:       (v1:status.podIP)
      KB_PODIPS:      (v1:status.podIPs)
      KB_POD_FQDN:   $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fnx89 (ro)
Containers:
  halo:
    Container ID:  containerd://1746f01ecffbd8377de658fc8571d87630ea235ecd9b1eb050672b61f2f60bca
    Image:         registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64
    Image ID:      registry.cn-hangzhou.aliyuncs.com/halocloud/halovector@sha256:fb632a1ad19bd1506e3fbaec537f205e24c540841cf02370ca73668e65e58ff7
    Port:          1921/TCP
    Host Port:     0/TCP
    Command:
      /halo-scripts/setup.sh
    State:          Running
      Started:      Thu, 06 Jun 2024 14:19:29 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 06 Jun 2024 14:17:46 +0800
      Finished:     Thu, 06 Jun 2024 14:19:29 +0800
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:     1
      memory:  2Gi
    Liveness:  exec [/bin/bash -c gosu halo psql -U halo -d halo0root -c  "select 1"
] delay=65s timeout=5s period=3s #success=1 #failure=3
    Environment Variables from:
      halo-cluster-halo-env      ConfigMap  Optional: false
      halo-cluster-halo-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                         halo-cluster-halo-1 (v1:metadata.name)
      KB_POD_UID:                           (v1:metadata.uid)
      KB_NAMESPACE:                        default (v1:metadata.namespace)
      KB_SA_NAME:                           (v1:spec.serviceAccountName)
      KB_NODENAME:                          (v1:spec.nodeName)
      KB_HOST_IP:                           (v1:status.hostIP)
      KB_POD_IP:                            (v1:status.podIP)
      KB_POD_IPS:                           (v1:status.podIPs)
      KB_HOSTIP:                            (v1:status.hostIP)
      KB_PODIP:                             (v1:status.podIP)
      KB_PODIPS:                            (v1:status.podIPs)
      KB_POD_FQDN:                         $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
      PGDATA:                              /data/halo
      HALOPORT:                            1921
      ALLOW_NOSSL:                         true
      POD_NAME:                            halo-cluster-halo-1 (v1:metadata.name)
      PATRONI_KUBERNETES_POD_IP:            (v1:status.podIP)
      PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(KB_PODIP):1921
      POD_NAMESPACE:                       default (v1:metadata.namespace)
      PATRONI_SUPERUSER_USERNAME:          <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_SUPERUSER_PASSWORD:          <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_REPLICATION_USERNAME:        replica
      PATRONI_REPLICATION_PASSWORD:        <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_USER_REWIND:                 patroni
      PATRONI_PASSWORD_REWIND:             patroni
      PATRONI_RESTAPI_CONNECT_ADDRESS:     $(KB_PODIP):8008
      PATRONI_RESTAPI_LISTEN:              0.0.0.0:8008
      PATRONI_ETCD3_HOST:                  etcd-cluster-etcd:2379
      PATRONI_NAME:                        halo-cluster-halo-1 (v1:metadata.name)
      PATRONI_POSTGRESQL_LISTEN:           0.0.0.0:1921
      PATRONI_SCOPE:                       $(KB_CLUSTER_NAME)
      KUBERNETES_ROLE_LABEL:               apps.kubeblocks.halo.patroni/role
      KB_POD_FQDN:                         $KB_POD_FQDN
    Mounts:
      /data/halo from data (rw)
      /dev/shm from dshm (rw)
      /halo-scripts from scripts (rw)
      /var/lib/halo/conf from halo-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fnx89 (ro)
  lorry:
    Container ID:  containerd://9acd5efc0d23842241010b957f48ee5ebe49f8da35b46d5612f1c27997643ef9
    Image:         registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64
    Image ID:      registry.cn-hangzhou.aliyuncs.com/halocloud/halovector@sha256:fb632a1ad19bd1506e3fbaec537f205e24c540841cf02370ca73668e65e58ff7
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /kubeblocks/lorry
      --port
      3501
      --grpcport
      50001
      --config-path
      /kubeblocks/config/lorry/components/
    State:          Running
      Started:      Thu, 06 Jun 2024 14:17:46 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:      0
      memory:   0
    Readiness:  http-get http://:3501/v1.0/checkrole delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      halo-cluster-halo-env      ConfigMap  Optional: false
      halo-cluster-halo-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                         halo-cluster-halo-1 (v1:metadata.name)
      KB_POD_UID:                           (v1:metadata.uid)
      KB_NAMESPACE:                        default (v1:metadata.namespace)
      KB_SA_NAME:                           (v1:spec.serviceAccountName)
      KB_NODENAME:                          (v1:spec.nodeName)
      KB_HOST_IP:                           (v1:status.hostIP)
      KB_POD_IP:                            (v1:status.podIP)
      KB_POD_IPS:                           (v1:status.podIPs)
      KB_HOSTIP:                            (v1:status.hostIP)
      KB_PODIP:                             (v1:status.podIP)
      KB_PODIPS:                            (v1:status.podIPs)
      KB_POD_FQDN:                         $(KB_POD_NAME).halo-cluster-halo-headless.$(KB_NAMESPACE).svc
      KB_RSM_ROLE_PROBE_PERIOD:            0
      KB_BUILTIN_HANDLER:                  custom
      KB_SERVICE_USER:                     <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      KB_SERVICE_PASSWORD:                 <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      KB_SERVICE_PORT:                     1921
      KB_DATA_PATH:                        /data/halo
      KB_ACTION_COMMANDS:                  {"roleProbe":["Status=$(curl -s http://localhost:8008) \u0026\u0026 \nrole=$(echo $Status | jq .role  | tr -d '\"') \u0026\u0026\nif [ \"$role\" = \"master\" ]; then echo -n  \"primary\"; else echo -n  \"secondary\"; fi\n"]}
      PGDATA:                              /data/halo
      HALOPORT:                            1921
      ALLOW_NOSSL:                         true
      POD_NAME:                            halo-cluster-halo-1 (v1:metadata.name)
      PATRONI_KUBERNETES_POD_IP:            (v1:status.podIP)
      PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(KB_PODIP):1921
      POD_NAMESPACE:                       default (v1:metadata.namespace)
      PATRONI_SUPERUSER_USERNAME:          <set to the key 'username' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_SUPERUSER_PASSWORD:          <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_REPLICATION_USERNAME:        replica
      PATRONI_REPLICATION_PASSWORD:        <set to the key 'password' in secret 'halo-cluster-conn-credential'>  Optional: false
      PATRONI_USER_REWIND:                 patroni
      PATRONI_PASSWORD_REWIND:             patroni
      PATRONI_RESTAPI_CONNECT_ADDRESS:     $(KB_PODIP):8008
      PATRONI_RESTAPI_LISTEN:              0.0.0.0:8008
      PATRONI_ETCD3_HOST:                  etcd-cluster-etcd:2379
      PATRONI_NAME:                        halo-cluster-halo-1 (v1:metadata.name)
      PATRONI_POSTGRESQL_LISTEN:           0.0.0.0:1921
      PATRONI_SCOPE:                       $(KB_CLUSTER_NAME)
      KUBERNETES_ROLE_LABEL:               apps.kubeblocks.halo.patroni/role
      KB_POD_FQDN:                         $KB_POD_FQDN
      KB_RSM_ACTION_SVC_LIST:              null
      KB_RSM_ROLE_UPDATE_MECHANISM:        DirectAPIServerEventUpdate
      KB_RSM_ROLE_PROBE_TIMEOUT:           1
      KB_CLUSTER_NAME:                      (v1:metadata.labels['app.kubernetes.io/instance'])
      KB_COMP_NAME:                         (v1:metadata.labels['apps.kubeblocks.io/component-name'])
      KB_SERVICE_CHARACTER_TYPE:           unknown
    Mounts:
      /data/halo from data (rw)
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fnx89 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  2Gi
  halo-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      halo-cluster-halo-halo-configuration
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      halo-cluster-halo-halo-scripts
    Optional:  false
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-halo-cluster-halo-1
    ReadOnly:   false
  kubeblocks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-fnx89:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               4m40s                 default-scheduler        Successfully assigned default/halo-cluster-halo-1 to gke-infracreate-gke-default-pool-61c5a7f3-kan1
  Normal   SuccessfulAttachVolume  4m33s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-42f74ccd-d66d-41e9-b98f-1574b6c2924a"
  Normal   Pulling                 4m31s                 kubelet                  Pulling image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.30"
  Normal   Pulled                  4m25s                 kubelet                  Successfully pulled image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.30" in 6.571459957s (6.57148585s including waiting)
  Normal   Created                 4m25s                 kubelet                  Created container init-lorry
  Normal   Started                 4m25s                 kubelet                  Started container init-lorry
  Normal   Pulling                 4m22s                 kubelet                  Pulling image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64"
  Normal   Created                 3m23s                 kubelet                  Created container lorry
  Normal   Pulled                  3m23s                 kubelet                  Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64" in 59.437960951s (59.437978261s including waiting)
  Normal   Pulled                  3m23s                 kubelet                  Container image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64" already present on machine
  Normal   Started                 3m22s                 kubelet                  Started container lorry
  Normal   checkRole               3m21s                 lorry                    {"operation":"checkRole","originalRole":"waitForStart"}
  Normal   Started                 100s (x2 over 3m23s)  kubelet                  Started container halo
  Normal   Pulled                  100s                  kubelet                  Container image "registry.cn-hangzhou.aliyuncs.com/halocloud/halovector:14.10.231127-amd64" already present on machine
  Normal   Created                 100s (x2 over 3m23s)  kubelet                  Created container halo
  Warning  Unhealthy               28s (x6 over 2m16s)   kubelet                  Liveness probe failed: psql: error: connection to server on socket "/var/run/halo/.s.PGSQL.1921" failed: No such file or directory
           Is the server running locally and accepting connections on that socket?
  Normal   Killing  28s (x2 over 2m10s)  kubelet  Container halo failed liveness probe, will be restarted

JashBook avatar Jun 06 '24 06:06 JashBook

Halo can be created successfully when create etcd-cluster first, closing the issue

ahjing99 avatar Jun 24 '24 08:06 ahjing99