kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] pulsar cluster pulsar-proxy crash and bookies-recovery always init create serviceRefs zookeeper cluster

Open JashBook opened this issue 10 months ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. create zk cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: zookeeperp-cluster
  namespace: default
spec:
  clusterDefinitionRef: pulsar-zookeeper
  clusterVersionRef: pulsar-3.0.2
  terminationPolicy: Delete
  affinity:
    podAntiAffinity: Preferred
    topologyKeys:
      - kubernetes.io/hostname
    tenancy: SharedNode
  tolerations:
    - key: kb-data
      operator: Equal
      value: "true"
      effect: NoSchedule
  componentSpecs:
    - name: zookeeper
      componentDefRef: zookeeper
      monitor: false
      replicas: 3
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. create pulsar cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:    
  labels:
    clusterdefinition.kubeblocks.io/name: pulsar
    clusterversion.kubeblocks.io/name: pulsar-3.0.2
  name: pulsar-cluster
  namespace: default
spec:
  clusterDefinitionRef: pulsar
  clusterVersionRef: pulsar-3.0.2
  componentSpecs:
  - componentDefRef: pulsar-broker
    monitor: false
    name: pulsar-broker
    replicas: 3
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceAccountName: kb-pulsar-cluster
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  - componentDefRef: pulsar-proxy
    monitor: true
    name: pulsar-proxy
    replicas: 1
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  - componentDefRef: bookies
    monitor: true
    name: bookies
    replicas: 3
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: journal
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
    - name: ledgers
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  - componentDefRef: bookies-recovery
    monitor: true
    name: bookies-recovery
    replicas: 1
    resources:
      limits:
        cpu: "0.5"
        memory: 0.5Gi
      requests:
        cpu: "0.5"
        memory: 0.5Gi
    serviceRefs:
    - cluster: zookeeperp-cluster
      name: pulsarZookeeper
      namespace: default
    volumeClaimTemplates:
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  services:
  - componentSelector: proxy
    name: proxy
    serviceName: proxy
    spec:
      ports:
      - name: pulsar
        port: 6650
        protocol: TCP
        targetPort: 6650
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
      type: ClusterIP
  - componentSelector: broker
    name: broker-bootstrap
    serviceName: broker-bootstrap
    spec:
      ports:
      - name: pulsar
        port: 6650
        protocol: TCP
        targetPort: 6650
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
      - name: kafka-client
        port: 9092
        protocol: TCP
        targetPort: 9092
      type: ClusterIP
  terminationPolicy: Delete
  tolerations:
  - effect: NoSchedule
    key: kb-data
    operator: Equal
    value: "true"
  1. See error
kubectl get pod 
NAME                                READY   STATUS             RESTARTS       AGE
pulsar-cluster-bookies-0            2/2     Running            0              9m47s
pulsar-cluster-bookies-1            2/2     Running            0              9m47s
pulsar-cluster-bookies-2            2/2     Running            0              9m47s
pulsar-cluster-bookies-recovery-0   0/2     Init:0/1           0              9m51s
pulsar-cluster-pulsar-broker-0      3/3     Running            0              9m46s
pulsar-cluster-pulsar-broker-1      3/3     Running            0              9m46s
pulsar-cluster-pulsar-broker-2      3/3     Running            0              9m46s
pulsar-cluster-pulsar-proxy-0       1/2     CrashLoopBackOff   5 (108s ago)   9m50s
zookeeperp-cluster-zookeeper-0      2/2     Running            0              9m52s
zookeeperp-cluster-zookeeper-1      2/2     Running            0              9m52s
zookeeperp-cluster-zookeeper-2      2/2     Running            0              9m52s

logs CrashLoopBackOff pod pulsar-proxy serviceRefs not effective zk endpoint "pulsar-cluster-zookeeper.default.svc:2181"

kubectl logs pulsar-cluster-pulsar-proxy-0 proxy --tail 30
[conf/proxy.conf] Updating config statusFilePath=/pulsar/status
[conf/proxy.conf] Adding config: maxMessageSize=5242880
[conf/proxy.conf] Applying config brokerServiceURL = pulsar://pulsar-cluster-pulsar-broker:6650
[conf/proxy.conf] Applying config brokerWebServiceURL = http://pulsar-cluster-pulsar-broker:80
[conf/proxy.conf] Applying config clusterName = default-pulsar-cluster-pulsar-proxy
[conf/proxy.conf] Applying config metadataStoreUrl = pulsar-cluster-zookeeper.default.svc:2181
[conf/proxy.conf] Applying config webServicePort = 8080
VM settings:
    Max. Heap Size (Estimated): 154.00M
    Using VM: OpenJDK 64-Bit Server VM

2024-04-12T07:25:10,862+0000 [main] INFO  org.apache.pulsar.broker.authentication.AuthenticationService - Authentication is disabled
2024-04-12T07:25:11,360+0000 [main] INFO  org.apache.pulsar.proxy.extensions.ProxyExtensionsUtils - Searching for extensions in /pulsar/./proxyextensions
2024-04-12T07:25:11,360+0000 [main] WARN  org.apache.pulsar.proxy.extensions.ProxyExtensionsUtils - extension directory not found
2024-04-12T07:25:11,456+0000 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @4392ms to org.eclipse.jetty.util.log.Slf4jLog
2024-04-12T07:25:11,761+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56, built on 2023-10-05 10:34 UTC
2024-04-12T07:25:11,761+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:host.name=pulsar-cluster-pulsar-proxy-0.pulsar-cluster-pulsar-proxy-headless.default.svc.cluster.local
2024-04-12T07:25:11,761+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.version=17.0.7
2024-04-12T07:25:11,761+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Debian
2024-04-12T07:25:11,761+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-17-openjdk-arm64
...
	at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]
	at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:22:13,061+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/<unresolved>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/<unresolved>:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:22:14,163+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] ERROR org.apache.zookeeper.client.StaticHostProvider - Unable to resolve address: pulsar-cluster-zookeeper.default.svc/<unresolved>:2181
java.net.UnknownHostException: pulsar-cluster-zookeeper.default.svc
	at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]
	at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:22:14,163+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/<unresolved>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/<unresolved>:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]

logs bookies-recovery

kubectl logs pulsar-cluster-bookies-recovery-0 check-bookies --tail 30

+ bin/apply-config-from-env.py conf/bookkeeper.conf
[conf/bookkeeper.conf] Applying config httpServerEnabled = true
[conf/bookkeeper.conf] Applying config httpServerPort = 8000
[conf/bookkeeper.conf] Applying config lostBookieRecoveryDelay = 300
[conf/bookkeeper.conf] Applying config prometheusStatsHttpPort = 8000
[conf/bookkeeper.conf] Applying config useHostNameAsBookieID = true
[conf/bookkeeper.conf] Applying config zkServers = pulsar-cluster-zookeeper.default.svc:2181
+ bin/bookkeeper shell whatisinstanceid
JAVA_HOME not set, using java from PATH. (/usr/bin/java)
[0.004s][trace][gc,heap]   Maximum heap size 7326386176
[0.004s][trace][gc,heap]   Initial heap size 228949568
[0.004s][trace][gc,heap]   Minimum heap size 6815736
[0.005s][debug][gc,heap] Minimum heap 8388608  Initial heap 230686720  Maximum heap 7327449088
[0.005s][info ][gc     ] Using G1
[0.010s][info ][gc,init] Version: 17.0.7+7-Debian-1deb11u1 (release)
[0.010s][info ][gc,init] CPUs: 7 total, 7 available
[0.010s][info ][gc,init] Memory: 13973M
[0.010s][info ][gc,init] Large Page Support: Disabled
[0.010s][info ][gc,init] NUMA Support: Disabled
[0.010s][info ][gc,init] Compressed Oops: Enabled (Zero based)
[0.010s][info ][gc,init] Heap Region Size: 4M
[0.010s][info ][gc,init] Heap Min Capacity: 8M
[0.010s][info ][gc,init] Heap Initial Capacity: 220M
[0.010s][info ][gc,init] Heap Max Capacity: 6988M
[0.010s][info ][gc,init] Pre-touch: Disabled
[0.010s][info ][gc,init] Parallel Workers: 4
[0.010s][info ][gc,init] Concurrent Workers: 4
[0.010s][info ][gc,init] Concurrent Refinement Workers: 4
[0.010s][info ][gc,init] Periodic GC: Disabled
[0.010s][info ][gc,metaspace] CDS archive(s) mapped at: [0x0000000800000000-0x0000000800be2000-0x0000000800be2000), size 12460032, SharedBaseAddress: 0x0000000800000000, ArchiveRelocationMode: 0.
[0.010s][info ][gc,metaspace] Compressed class space mapped at: 0x0000000801000000-0x0000000841000000, reserved size: 1073741824
[0.010s][info ][gc,metaspace] Narrow klass base: 0x0000000800000000, Narrow klass shift: 0, Narrow klass range: 0x100000000
[0.296s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 277307959 ns, Reaching safepoint: 500625 ns, At safepoint: 16000 ns, Total: 516625 ns
[0.554s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 257106792 ns, Reaching safepoint: 305875 ns, At safepoint: 18291 ns, Total: 324166 ns
[0.823s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 268665625 ns, Reaching safepoint: 221542 ns, At safepoint: 4375 ns, Total: 225917 ns
[0.978s][info ][safepoint   ] Safepoint "ICBufferFull", Time since last: 155170958 ns, Reaching safepoint: 212792 ns, At safepoint: 8792 ns, Total: 221584 ns
2024-04-12T07:14:39,428+0000 [main] INFO  org.apache.bookkeeper.meta.MetadataDrivers - BookKeeper metadata driver manager initialized
2024-04-12T07:14:39,457+0000 [main] INFO  org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Initialize zookeeper metadata driver at metadata service uri zk+null://pulsar-cluster-zookeeper.default.svc:2181/ledgers : zkServers = pulsar-cluster-zookeeper.default.svc:2181, ledgersRootPath = /ledgers.
2024-04-12T07:14:39,476+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56, built on 2023-10-05 10:34 UTC
2024-04-12T07:14:39,477+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:host.name=pulsar-cluster-bookies-recovery-0.pulsar-cluster-bookies-recovery-headless.default.svc.cluster.local
2024-04-12T07:14:39,478+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.version=17.0.7
2024-04-12T07:14:39,478+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Debian
2024-04-12T07:14:39,478+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-17-openjdk-arm64
...

	at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]
	at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:26:34,013+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/<unresolved>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/<unresolved>:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:26:35,114+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] ERROR org.apache.zookeeper.client.StaticHostProvider - Unable to resolve address: pulsar-cluster-zookeeper.default.svc/<unresolved>:2181
java.net.UnknownHostException: pulsar-cluster-zookeeper.default.svc
	at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]
	at java.net.InetAddress.getAllByName0(InetAddress.java:1533) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1385) ~[?:?]
	at java.net.InetAddress.getAllByName(InetAddress.java:1306) ~[?:?]
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
2024-04-12T07:26:35,114+0000 [main-SendThread(pulsar-cluster-zookeeper.default.svc:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server pulsar-cluster-zookeeper.default.svc/<unresolved>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.lang.IllegalArgumentException: Unable to canonicalize address pulsar-cluster-zookeeper.default.svc/<unresolved>:2181 because it's not resolvable
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1157) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1207) ~[org.apache.zookeeper-zookeeper-3.8.3.jar:3.8.3]

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

JashBook avatar Apr 12 '24 07:04 JashBook

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar May 13 '24 00:05 github-actions[bot]