solrcloud-zookeeper-kubernetes
solrcloud-zookeeper-kubernetes copied to clipboard
solrcloud zookeeper setup issues on kubernetes cluster
Have encountered following issues while I am trying to setup solrcloud and zookeeper cluster on kubernetes cluster (multi node),
Following are the scenarios experimented....
Scenario 1 - As is with public docker images (solr, zookeeper) on cluster
Steps:
- Clone the repo
- Change the configs as per cluster ( e.g storage class, own namespace...etc)
- ./start-aws-zookeeper-ensemble
- ./start-aws-solr-cluster ( after some 15 seconds)
Issues:
zookeeper.log
java.net.UnknownHostException: zk-2.zkensemble at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
Solr logs: - Didn't throw up any errors and I could see solr is able to connect zookeper clsuter
2020-04-28 09:49:17.022 INFO (main) [ ] o.a.s.c.SolrResourceLoader [null] Added 0 libs to classloader, from paths: [] 2020-04-28 09:49:17.255 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, whitelistHostCheckingEnabled=true] 2020-04-28 09:49:17.553 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@ac20bb4[provider=null,keyStore=null,trustStore=null] 2020-04-28 09:49:17.742 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@63c12e52[provider=null,keyStore=null,trustStore=null] 2020-04-28 09:49:17.763 INFO (main) [ ] o.a.s.c.ZkContainer Zookeeper client=zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 2020-04-28 09:49:17.825 INFO (zkConnectionManagerCallback-9-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected 2020-04-28 09:49:20.075 INFO (main) [ ] o.a.s.c.OverseerElectionContext I am going to be the leader solr-0.solrcluster:8983_solr 2020-04-28 09:49:20.108 INFO (main) [ ] o.a.s.c.Overseer Overseer (id=145194837495316480-solr-0.solrcluster:8983_solr-n_0000000000) starting 2020-04-28 09:49:20.272 INFO (zkConnectionManagerCallback-16-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected 2020-04-28 09:49:20.300 INFO (main) [ ] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 ready 2020-04-28 09:49:20.434 INFO (main) [ ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/solr-0.solrcluster:8983_solr 2020-04-28 09:49:20.440 INFO (OverseerStateUpdate-145194837495316480-solr-0.solrcluster:8983_solr-n_0000000000) [ ] o.a.s.c.Overseer Starting to work on the main queue : solr-0.solrcluster:
Scenario 2 - Have rebuilt docker images (solr, zookeeper) using RHEL as base OS and deployed on K8s cluster
- Clone the repo
- Change the configs as per cluster ( e.g storage class, own namespace...etc)
- ./start-aws-zookeeper-ensemble
- ./start-aws-solr-cluster ( after some 15 seconds)
Zookeeper logs :
2020-04-28 10:18:05,913 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading configuration from: /conf/zoo.cfg
2020-04-28 10:18:05,944 [myid:] - WARN [main:QuorumPeer$QuorumServer@191] - Failed to resolve address: zk-20.0.0.0
java.net.UnknownHostException: zk-20.0.0.0: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:181)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.
Container logs:
/bin/sh: hostname: command not found
The above error seems to be coming while resolving hostname and replacing string during pod creation in statefulset..
if [ ! -f $ZOO_DATA_DIR/myid ] ; then $(echo $((${HOSTNAME##*-}+1)) > $ZOO_DATA_DIR/myid ) else touch /conf/test; fi && \
**$(echo $ZOO_SERVERS | sed \"s/$(hostname).zkensemble/0.0.0.0/g\" > /conf/zooservers.txt) && \**
Solr logs
Caused by: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 within 30000 ms
at org.apache.solr.common.cloud.SolrZkClient.
Looking forward for help.....and do let me know if you need any other details.
Did you solve this issue? Did it block implementation?
yes...its resolved.
How was it resolved?
The issue was hostname not getting resolved in start up script. I have used POD_NAME with env var instead and started working..
if [ ! -f $ZOO_DATA_DIR/myid ] ; then $(echo $((${HOSTNAME##*-}+1)) > $ZOO_DATA_DIR/myid ) else touch /conf/test; fi &&
*$(echo $ZOO_SERVERS | sed "s/$MY_POD_NAME.zkensemble/0.0.0.0/g" > /conf/zooservers.txt) && *
.... env: -name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name
@gs-offcl thanks for the info. Just a question, is there a typo in: "s/$MY_POD_NAME.zkensemble/0.0.0.0/g" ?
Given your comment I suppose the correct line should be "s/$POD_NAME.zkensemble/0.0.0.0/g" without MY_.
Right?