pulsar-helm-chart
pulsar-helm-chart copied to clipboard
pulsar waiting to start: PodInitializing
[root@vm192-168-2-4 ~]# kubectl version Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.6", GitCommit:"d32e40e20d167e103faf894261614c5b45c44198", GitTreeState:"clean", BuildDate:"2020-05-20T13:08:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.21) and server (1.17) exceeds the supported minor version skew of +/-1
[root@vm192-168-2-4 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE nfs-client-provisioner-6f664979c5-zvkbs 1/1 Running 0 98s nginx-86c57db685-5h4cd 1/1 Running 0 61m nginx-86c57db685-7vc7z 1/1 Running 0 62m pulsar-bookie-0 0/1 Init:0/1 0 5h38m pulsar-bookie-1 0/1 Init:0/1 0 5h38m pulsar-bookie-2 0/1 Init:0/1 0 5h38m pulsar-bookie-3 0/1 Init:0/1 0 5h38m pulsar-broker-0 0/1 Init:0/2 0 5h38m pulsar-broker-1 0/1 Init:0/2 0 5h38m pulsar-broker-2 0/1 Init:0/2 0 5h38m pulsar-grafana-74fbdb9cc6-6s7dc 1/1 Running 0 5h38m pulsar-prometheus-556c44f697-2mk2g 1/1 Running 0 5h38m pulsar-proxy-0 0/1 Init:0/2 0 5h38m pulsar-proxy-1 0/1 Init:0/2 0 5h38m pulsar-proxy-2 0/1 Init:0/2 0 5h38m pulsar-pulsar-manager-767d5f5766-nnpw9 1/1 Running 0 5h38m pulsar-recovery-0 0/1 Init:0/1 0 5h38m pulsar-toolset-0 1/1 Running 0 5h38m pulsar-zookeeper-0 1/1 Running 0 5h38m pulsar-zookeeper-1 1/1 Running 0 5h38m pulsar-zookeeper-2 1/1 Running 0 5h37m
[root@vm192-168-2-4 ~]# helm list WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION pulsar default 1 2021-05-31 12:08:47.649624912 +0800 CST deployed pulsar-2.7.1 2.7.1
[root@vm192-168-2-4 ~]# kubectl logs pulsar-broker-0 Error from server (BadRequest): container "pulsar-broker" in pod "pulsar-broker-0" is waiting to start: PodInitializing
zookeeper pod commit # bin/bookkeeper shell metaformat pulsar-bookie pod is ok ,running
commit #bin/pulsar zookeeper-shell -server pulsar-zookeeper get /admin/clusters/pulsar
06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA> 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=4.18.0-80.1.1.el7.ksyun.x86_64 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name=root 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/pulsar 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.free=65MB 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.max=128MB 06:41:04.299 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.memory.total=77MB 06:41:04.302 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=pulsar-zookeeper sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@4ebff610 06:41:04.306 [main] INFO org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 06:41:04.311 [main] INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 10485760 Bytes 06:41:04.317 [main] INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled= 06:41:04.325 [main-SendThread(pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server pulsar-zookeeper/10.0.5.11:2181. Will not attempt to authenticate using SASL (unknown error) 06:41:04.329 [main-SendThread(pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.0.5.11:50066, server: pulsar-zookeeper/10.0.5.11:2181 06:41:04.340 [main-SendThread(pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server pulsar-zookeeper/10.0.5.11:2181, sessionid = 0x100176ed33aabd3, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /admin/clusters/pulsar
I have the some issue
First in values.yaml make sure that
initialize: true
Is set to true.
Also, this might or might not help:
Add this line to proxy-configmap.yaml under "data" section,
data:
zookeeperServers: "{{ template "pulsar.zookeeper.connect" . }}{{ .Values.metadataPrefix }}"
Kind of similar problem for me. Reinstalling a pulsar cluster chart=2.7.7 with initialize=true results in brokers / proxies getting stuck waiting for zookeeper.
I could track it down to incorrect arguments of pulsar-init-job. Here are some logs:
Exception in thread "main" com.beust.jcommander.ParameterException: The following options are required: [-cs | --configuration-store], [-uw | --web-service-url], [-zk | --zookeeper], [-c | --cluster]
at com.beust.jcommander.JCommander.validateOptions(JCommander.java:388)
at com.beust.jcommander.JCommander.parse(JCommander.java:357)
at com.beust.jcommander.JCommander.parse(JCommander.java:335)
at org.apache.pulsar.PulsarClusterMetadataSetup.main(PulsarClusterMetadataSetup.java:146)
Usage: <main class> [options]
Options:
-ub, --broker-service-url
Broker-service URL for new cluster
-tb, --broker-service-url-tls
Broker-service URL for new cluster with TLS encryption
* -c, --cluster
Cluster name
* -cs, --configuration-store
Configuration Store connection string
--existing-bk-metadata-service-uri
The metadata service URI of the existing BookKeeper cluster that you
want to use
-h, --help
Show this help message
Default: false
--initial-num-stream-storage-containers
Num storage containers of BookKeeper stream storage
Default: 16
--initial-num-transaction-coordinators
Num transaction coordinators will assigned in cluster
Default: 16
* -uw, --web-service-url
Web-service URL for new cluster
-tw, --web-service-url-tls
Web-service URL for new cluster with TLS encryption
* -zk, --zookeeper
Local ZooKeeper quorum connection string
--zookeeper-session-timeout-ms
Local zookeeper session timeout ms
Default: 30000
sh: 4: --cluster: not found
Note that the init-job exited correctly because there is an || true
that catches all errors.
I guess that should be considered a problem, should it not?
To me it looks like the problem has to do with helm-templates expanding empty lines without breaks (\
) causing the shell call to fail.
Here is what the helm renders
bin/pulsar initialize-cluster-metadata \
--cluster pulsar \
--zookeeper pulsar-zookeeper:2181 \
--configuration-store pulsar-zookeeper:2181 \
--web-service-url http://pulsar-broker.pulsar.svc.cluster.local:8080/ \
--web-service-url-tls https://pulsar-broker.pulsar.svc.cluster.local:8443/ \
--broker-service-url pulsar://pulsar-broker.pulsar.svc.cluster.local:6650/ \
--broker-service-url-tls pulsar+ssl://pulsar-broker.pulsar.svc.cluster.local:6651/ || true;
I exed into the shell of one of the running bookies to reproduce the problem and yes, with the empty line it gives the error message noted above.
Manually calling the cluster initialization with the empty line removed helped in my case to get the cluster working again.
Strange, because the helm-template looks fine to me. I have no idea where this extra line is coming from
- https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/templates/pulsar-cluster-initialize.yaml#L92 Git blame tells no changes there for about the last 16 months.
Probably, a suggestion for the future: Would it not be possible to default the initialize=true
-value from Helm's builtin-objects like {{- if .Release.IsInstall }}
?
To me it looks like the problem has to do with helm-templates expanding empty lines without breaks (
\
) causing the shell call to fail. Here is what the helm rendersbin/pulsar initialize-cluster-metadata \ --cluster pulsar \ --zookeeper pulsar-zookeeper:2181 \ --configuration-store pulsar-zookeeper:2181 \ --web-service-url http://pulsar-broker.pulsar.svc.cluster.local:8080/ \ --web-service-url-tls https://pulsar-broker.pulsar.svc.cluster.local:8443/ \ --broker-service-url pulsar://pulsar-broker.pulsar.svc.cluster.local:6650/ \ --broker-service-url-tls pulsar+ssl://pulsar-broker.pulsar.svc.cluster.local:6651/ || true;
I exed into the shell of one of the running bookies to reproduce the problem and yes, with the empty line it gives the error message noted above.
Manually calling the cluster initialization with the empty line removed helped in my case to get the cluster working again.
This issue seems to be resolved by #166
@lhotari Good point. I will check today with upgrading to 2.7.8.
We had a similar issue, it turns out that broker-init
and pulsar-init
scripts run on an image that need at least 2.5Mi of memory. If the job is run on a cluster node with no sufficient memory, the initialization fails and the cluster setup hangs.
We added :
bookkeeper:
metadata:
resources:
requests:
memory: 2.5Gi
limits:
memory: 2.5Gi
pulsar_metadata::
metadata:
resources:
requests:
memory: 2.5Gi
limits:
memory: 2.5Gi
To make sure that:
- The job is assigned to a proper node (requests)
- The initialization fail fast when there is no node (limits)
It looks like this is similar to https://github.com/apache/pulsar-helm-chart/issues/373 where containers that runs bin/bookeeper or bin/pulsar sets the jvm memory args to 2Gb. Each call to any of these scripts causes memory usage peak and may exceed container's memory.
It still happens even after setting:
resources:
requests:
memory: 3Gi
limits:
memory: 3Gi
Please retry with latest released version 3.2.0 . Please create a new issue if the problem persist.