clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Loop crash of zookeepers (readiness probe failed)

Open jdalia001 opened this issue 1 year ago • 1 comments

Hi,

I am facing big issue with my zookeeper cluster, there are restarting indefinitely, I don't know why exactly.

There is the describe on one pod :

Name:             clickhouse-zookeeper-0
Namespace:        logs
Priority:         0
Service Account:  default
Node:             k8s-eu-de-worker-03/141.95.97.154
Start Time:       Fri, 15 Sep 2023 08:50:23 +0200
Labels:           app=clickhouse-zookeeper
                  controller-revision-hash=clickhouse-zookeeper-6889799877
                  statefulset.kubernetes.io/pod-name=clickhouse-zookeeper-0
                  what=node
Annotations:      cni.projectcalico.org/containerID: f00edc7afb77d779ab79fe17415a8c8f2545c3fccfde4404968317bc2f10e0cb
                  cni.projectcalico.org/podIP: 10.233.77.184/32
                  cni.projectcalico.org/podIPs: 10.233.77.184/32
                  prometheus.io/port: 7000
                  prometheus.io/scrape: true
Status:           Running
IP:               10.233.77.184
IPs:
  IP:           10.233.77.184
Controlled By:  StatefulSet/clickhouse-zookeeper
Containers:
  kubernetes-zookeeper:
    Container ID:  containerd://18ad83c72d5761922dff5ae123d46fd0d45511e1c4778b4ae2d68330ae84c662
    Image:         docker.io/zookeeper:3.8.1
    Image ID:      docker.io/library/zookeeper@sha256:8242c71f0a529559f6513e949af42f3cf759bf4a05ef3b0e485af72884c8adb0
    Ports:         2181/TCP, 2888/TCP, 3888/TCP, 7000/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -x
      -c
      HOST=`hostname -s` &&
      DOMAIN=`hostname -d` &&
      CLIENT_PORT=2181 &&
      SERVER_PORT=2888 &&
      ELECTION_PORT=3888 &&
      PROMETHEUS_PORT=7000 &&
      ZOO_DATA_DIR=/var/lib/zookeeper/data &&
      ZOO_DATA_LOG_DIR=/var/lib/zookeeper/datalog &&
      {
        echo "clientPort=${CLIENT_PORT}"
        echo 'tickTime=2000'
        echo 'initLimit=300'
        echo 'syncLimit=10'
        echo 'maxClientCnxns=2000'
        echo 'maxTimeToWaitForEpoch=2000'
        echo 'maxSessionTimeout=60000000'
        echo "dataDir=${ZOO_DATA_DIR}"
        echo "dataLogDir=${ZOO_DATA_LOG_DIR}"
        echo 'autopurge.snapRetainCount=10'
        echo 'autopurge.purgeInterval=1'
        echo 'preAllocSize=131072'
        echo 'snapCount=3000000'
        echo 'leaderServes=yes'
        echo 'standaloneEnabled=false'
        echo '4lw.commands.whitelist=*'
        echo 'metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider'
        echo "metricsProvider.httpPort=${PROMETHEUS_PORT}"
        echo "skipACL=true"  
        echo "fastleader.maxNotificationInterval=10000"
      } > /conf/zoo.cfg &&
      {
        echo "zookeeper.root.logger=CONSOLE"
        echo "zookeeper.console.threshold=INFO"
        echo "log4j.rootLogger=\${zookeeper.root.logger}"
        echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender"
        echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}"
        echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout"
        echo "log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n"                                                       
      } > /conf/log4j.properties &&                                                                                                                          
      echo 'JVMFLAGS="-Xms128M -Xmx4G -XX:ActiveProcessorCount=8 -XX:+AlwaysPreTouch -Djute.maxbuffer=8388608 -XX:MaxGCPauseMillis=50"' > /conf/java.env &&  
      if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then                                                                                                                 
          NAME=${BASH_REMATCH[1]} &&                                                                                                                         
          ORD=${BASH_REMATCH[2]};                                                                                                                            
      else                                                                                                                                                   
          echo "Failed to parse name and ordinal of Pod" &&                                                                                                  
          exit 1;                                                                                                                                            
      fi &&                                                                                                                                                  
      mkdir -pv ${ZOO_DATA_DIR} &&                                                                                                                           
      mkdir -pv ${ZOO_DATA_LOG_DIR} &&                                                                                                                       
      whoami &&                                                                                                                                              
      chown -Rv zookeeper "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR" &&
      export MY_ID=$((ORD+1)) &&
      echo $MY_ID > $ZOO_DATA_DIR/myid &&
      for (( i=1; i<=$SERVERS; i++ )); do
          echo "server.$i=$NAME-$((i-1)).$DOMAIN:$SERVER_PORT:$ELECTION_PORT" >> /conf/zoo.cfg;
      done &&
      if [[ $SERVERS -eq 1 ]]; then
          echo "group.1=1" >> /conf/zoo.cfg;
      else
          echo "group.1=1:2:3" >> /conf/zoo.cfg;
      fi &&
      for (( i=1; i<=$SERVERS; i++ )); do
          WEIGHT=1
          if [[ $i == 1 ]]; then
            WEIGHT=10
          fi
          echo "weight.$i=$WEIGHT" >> /conf/zoo.cfg;
      done &&
      zkServer.sh start-foreground
      
    State:          Running
      Started:      Mon, 25 Sep 2023 11:20:49 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Mon, 25 Sep 2023 11:14:21 +0200
      Finished:     Mon, 25 Sep 2023 11:20:49 +0200
    Ready:          False
    Restart Count:  599
    Liveness:       exec [bash -xc date && OK=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf "ruok" >&3 ; IFS=; tee <&3; exec 3<&- ;); if [[ "$OK" == "imok" ]]; then exit 0; else exit 1; fi] delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:      exec [bash -c  IFS=; MNTR=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf "mntr" >&3 ; tee <&3; exec 3<&- ;); while [[ "$MNTR" == "This ZooKeeper instance is not currently serving requests" ]]; do echo "wait mntr works"; sleep 1; MNTR=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf "mntr" >&3 ; tee <&3; exec 3<&- ;); done; STATE=$(echo -e $MNTR | grep zk_server_state | cut -d " " -f 2); if [[ "$STATE" =~ "leader" ]]; then echo "check leader state"; SYNCED_FOLLOWERS=$(echo -e $MNTR | grep zk_synced_followers | awk -F"[[:space:]]+" "{print \$2}" | cut -d "." -f 1); if [[ "$SYNCED_FOLLOWERS" != "0" ]]; then ./bin/zkCli.sh ls /; exit $?; else exit 0; fi; elif [[ "$STATE" =~ "follower" ]]; then echo "check follower state"; PEER_STATE=$(echo -e $MNTR | grep zk_peer_state); if [[ "$PEER_STATE" =~ "following - broadcast" ]]; then ./bin/zkCli.sh ls /; exit $?; else exit 1; fi; else exit 1; fi ] delay=10s timeout=60s period=30s #success=1 #failure=3
    Environment:
      SERVERS:  3
    Mounts:
      /var/lib/zookeeper from datadir-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nqzmf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  datadir-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-volume-clickhouse-zookeeper-0
    ReadOnly:   false
  kube-api-access-nqzmf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  32m (x1369 over 2d18h)  kubelet  (combined from similar events): Liveness probe failed: command "bash -xc date && OK=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf \"ruok\" >&3 ; IFS=; tee <&3; exec 3<&- ;); if [[ \"$OK\" == \"imok\" ]]; then exit 0; else exit 1; fi" timed out
  Warning  Unhealthy  18m (x1068 over 2d16h)  kubelet  Readiness probe failed: command "bash -c  IFS=; MNTR=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf \"mntr\" >&3 ; tee <&3; exec 3<&- ;); while [[ \"$MNTR\" == \"This ZooKeeper instance is not currently serving requests\" ]]; do echo \"wait mntr works\"; sleep 1; MNTR=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf \"mntr\" >&3 ; tee <&3; exec 3<&- ;); done; STATE=$(echo -e $MNTR | grep zk_server_state | cut -d \" \" -f 2); if [[ \"$STATE\" =~ \"leader\" ]]; then echo \"check leader state\"; SYNCED_FOLLOWERS=$(echo -e $MNTR | grep zk_synced_followers | awk -F\"[[:space:]]+\" \"{print \\$2}\" | cut -d \".\" -f 1); if [[ \"$SYNCED_FOLLOWERS\" != \"0\" ]]; then ./bin/zkCli.sh ls /; exit $?; else exit 0; fi; elif [[ \"$STATE\" =~ \"follower\" ]]; then echo \"check follower state\"; PEER_STATE=$(echo -e $MNTR | grep zk_peer_state); if [[ \"$PEER_STATE\" =~ \"following - broadcast\" ]]; then ./bin/zkCli.sh ls /; exit $?; else exit 1; fi; else exit 1; fi " timed out
  Warning  Unhealthy  2m7s (x1646 over 10d)   kubelet  Liveness probe failed: command "bash -xc date && OK=$(exec 3<>/dev/tcp/127.0.0.1/2181 ; printf \"ruok\" >&3 ; IFS=; tee <&3; exec 3<&- ;); if [[ \"$OK\" == \"imok\" ]]; then exit 0; else exit 1; fi" timed out

There is the logs of pod 1:

+ zkServer.sh start-foreground
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2023-09-25 09:20:50,054 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@177] - Reading configuration from: /conf/zoo.cfg
2023-09-25 09:20:50,058 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@440] - clientPortAddress is 0.0.0.0:2181
2023-09-25 09:20:50,058 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@444] - secureClientPort is not set
2023-09-25 09:20:50,058 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@460] - observerMasterPort is not set
2023-09-25 09:20:50,059 [myid:] - INFO  [main:o.a.z.s.q.QuorumPeerConfig@477] - metricsProvider.className is org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
2023-09-25 09:20:50,069 [myid:] - INFO  [main:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:20:50,069 [myid:1] - INFO  [main:o.a.z.s.DatadirCleanupManager@78] - autopurge.snapRetainCount set to 10
2023-09-25 09:20:50,070 [myid:1] - INFO  [main:o.a.z.s.DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
2023-09-25 09:20:50,071 [myid:1] - INFO  [main:o.a.z.j.ManagedUtil@46] - Log4j 1.2 jmx support not found; jmx disabled.
2023-09-25 09:20:50,071 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeerMain@152] - Starting quorum peer, myid=1
2023-09-25 09:20:50,070 [myid:] - INFO  [PurgeTask:o.a.z.s.DatadirCleanupManager$PurgeTask@139] - Purge task started.
2023-09-25 09:20:50,074 [myid:] - INFO  [PurgeTask:o.a.z.s.p.FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2023-09-25 09:20:50,081 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@61] - zookeeper.snapshot.compression.method = CHECKED
2023-09-25 09:20:50,082 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.704b3ffe1. len = 117453718, byte = 0
2023-09-25 09:20:50,082 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7048b88ec. len = 117453718, byte = 0
2023-09-25 09:20:50,082 [myid:1] - INFO  [main:o.a.z.m.p.PrometheusMetricsProvider@113] - Initializing metrics, configuration: {httpPort=7000}
2023-09-25 09:20:50,082 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7045eee09. len = 117453718, byte = 0
2023-09-25 09:20:50,082 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.704353376. len = 117453718, byte = 0
2023-09-25 09:20:50,083 [myid:1] - INFO  [main:o.a.z.m.p.PrometheusMetricsProvider@562] - Executor service was created with numWorkerThreads 1 and maxQueueSize 1000000
2023-09-25 09:20:50,083 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7041b5a1c. len = 117453718, byte = 0
2023-09-25 09:20:50,083 [myid:1] - INFO  [main:o.a.z.m.p.PrometheusMetricsProvider@129] - Starting /metrics HTTP endpoint at host: 0.0.0.0, port: 7000, exportJvmInfo: true
2023-09-25 09:20:50,083 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.703ff2022. len = 117453718, byte = 0
2023-09-25 09:20:50,083 [myid:] - INFO  [PurgeTask:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.703db7708. len = 117453718, byte = 0
2023-09-25 09:20:50,085 [myid:] - INFO  [PurgeTask:o.a.z.s.DatadirCleanupManager$PurgeTask@145] - Purge task completed.
2023-09-25 09:20:50,098 [myid:1] - INFO  [main:o.e.j.u.l.Log@170] - Logging initialized @308ms to org.eclipse.jetty.util.log.Slf4jLog
2023-09-25 09:20:53,175 [myid:1] - INFO  [main:o.e.j.s.Server@375] - jetty-9.4.49.v20220914; built: 2022-09-14T01:07:36.601Z; git: 4231a3b2e4cb8548a412a789936d640a97b1aa0a; jvm 11.0.19+7
2023-09-25 09:20:53,198 [myid:1] - INFO  [main:o.e.j.s.h.ContextHandler@921] - Started o.e.j.s.ServletContextHandler@6340e5f0{/,null,AVAILABLE}
2023-09-25 09:20:53,206 [myid:1] - INFO  [main:o.e.j.s.AbstractConnector@333] - Started ServerConnector@13b13b5d{HTTP/1.1, (http/1.1)}{0.0.0.0:7000}
2023-09-25 09:20:53,207 [myid:1] - INFO  [main:o.e.j.s.Server@415] - Started @3417ms
2023-09-25 09:20:53,211 [myid:1] - INFO  [main:o.a.z.s.ServerMetrics@64] - ServerMetrics initialized with provider org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider@70ed52de
2023-09-25 09:20:53,217 [myid:1] - INFO  [main:o.a.z.s.a.DigestAuthenticationProvider@47] - ACL digest algorithm is: SHA1
2023-09-25 09:20:53,217 [myid:1] - INFO  [main:o.a.z.s.a.DigestAuthenticationProvider@61] - zookeeper.DigestAuthenticationProvider.enabled = true
2023-09-25 09:20:53,220 [myid:1] - INFO  [main:o.a.z.s.ServerCnxnFactory@169] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2023-09-25 09:20:53,220 [myid:1] - WARN  [main:o.a.z.s.ServerCnxnFactory@309] - maxCnxns is not configured, using default value 0.
2023-09-25 09:20:53,221 [myid:1] - INFO  [main:o.a.z.s.NIOServerCnxnFactory@652] - Configuring NIO connection handler with 10s sessionless connection timeout, 2 selector thread(s), 16 worker threads, and 64 kB direct buffers.
2023-09-25 09:20:53,222 [myid:1] - INFO  [main:o.a.z.s.NIOServerCnxnFactory@660] - binding to port 0.0.0.0/0.0.0.0:2181
2023-09-25 09:20:53,226 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@798] - zookeeper.quorumCnxnTimeoutMs=-1
2023-09-25 09:20:53,230 [myid:1] - WARN  [main:o.e.j.s.h.ContextHandler@1662] - o.e.j.s.ServletContextHandler@7ac296f6{/,null,STOPPED} contextPath ends with /*
2023-09-25 09:20:53,230 [myid:1] - WARN  [main:o.e.j.s.h.ContextHandler@1673] - Empty contextPath
2023-09-25 09:20:53,233 [myid:1] - INFO  [main:o.a.z.c.X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.p.FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1764] - Local sessions disabled
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1775] - Local session upgrading disabled
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1742] - tickTime set to 2000
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1786] - minSessionTimeout set to 4000
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1797] - maxSessionTimeout set to 60000000
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@1822] - initLimit set to 300
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2009] - syncLimit set to 10
2023-09-25 09:20:53,234 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2024] - connectToLearnerMasterLimit set to 0
2023-09-25 09:20:53,239 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] - 
2023-09-25 09:20:53,239 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -   ______                  _                                          
2023-09-25 09:20:53,239 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -  |___  /                 | |                                         
2023-09-25 09:20:53,239 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -     / /    ___     ___   | | __   ___    ___   _ __     ___   _ __   
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -    / /    / _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__|
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -   / /__  | (_) | | (_) | |   <  |  __/ |  __/ | |_) | |  __/ | |    
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -  /_____|  \___/   \___/  |_|\_\  \___|  \___| | .__/   \___| |_|
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -                                               | |                     
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] -                                               |_|                     
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.ZookeeperBanner@42] - 
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:zookeeper.version=3.8.1-74db005175a4ec545697012f9069cb9dcc8cdda7, built on 2023-01-25 16:31 UTC
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:host.name=clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.version=11.0.19
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.vendor=Eclipse Adoptium
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.home=/opt/java/openjdk
2023-09-25 09:20:53,240 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.class.path=/apache-zookeeper-3.8.1-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.8.1-bin/bin/../build/classes:/apache-zookeeper-3.8.1-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.8.1-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/zookeeper-prometheus-metrics-3.8.1.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/zookeeper-jute-3.8.1.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/zookeeper-3.8.1.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/snappy-java-1.1.7.7.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/slf4j-api-1.7.30.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/simpleclient_servlet-0.9.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/simpleclient_hotspot-0.9.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/simpleclient_common-0.9.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/simpleclient-0.9.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-transport-native-unix-common-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-transport-native-epoll-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-transport-classes-epoll-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-transport-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-resolver-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-handler-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-common-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-codec-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/netty-buffer-4.1.86.Final.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/metrics-core-4.1.12.1.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/logback-core-1.2.10.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/logback-classic-1.2.10.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jline-2.14.6.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-util-ajax-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-util-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-servlet-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-server-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-security-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-io-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jetty-http-9.4.49.v20220914.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jackson-databind-2.13.4.2.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jackson-core-2.13.4.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/jackson-annotations-2.13.4.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/commons-io-2.11.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/commons-cli-1.5.0.jar:/apache-zookeeper-3.8.1-bin/bin/../lib/audience-annotations-0.12.0.jar:/apache-zookeeper-3.8.1-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.8.1-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.io.tmpdir=/tmp
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:java.compiler=<NA>
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.name=Linux
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.arch=amd64
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.version=5.10.0-21-amd64
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:user.name=zookeeper
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:user.home=/home/zookeeper
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:user.dir=/apache-zookeeper-3.8.1-bin
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.memory.free=115MB
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.memory.max=4096MB
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.Environment@98] - Server environment:os.memory.total=130MB
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@140] - zookeeper.enableEagerACLCheck = false
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@153] - zookeeper.digest.enabled = true
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@157] - zookeeper.closeSessionTxn.enabled = true
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@1510] - zookeeper.flushDelay = 0 ms
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@1519] - zookeeper.maxWriteQueuePollTime = 0 ms
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@1528] - zookeeper.maxBatchSize=1000
2023-09-25 09:20:53,241 [myid:1] - INFO  [main:o.a.z.s.ZooKeeperServer@275] - zookeeper.intBufferStartingSizeBytes = 1024
2023-09-25 09:20:53,243 [myid:1] - INFO  [main:o.a.z.s.w.WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2023-09-25 09:20:53,243 [myid:1] - INFO  [main:o.a.z.s.w.WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2023-09-25 09:20:53,244 [myid:1] - INFO  [main:o.a.z.s.ZKDatabase@132] - zookeeper.snapshotSizeFactor = 0.33
2023-09-25 09:20:53,244 [myid:1] - INFO  [main:o.a.z.s.ZKDatabase@152] - zookeeper.commitLogCount=500
2023-09-25 09:20:53,248 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2088] - Using insecure (non-TLS) quorum communication
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2094] - Port unification disabled
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@181] - multiAddress.enabled set to false
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@206] - multiAddress.reachabilityCheckEnabled set to true
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@193] - multiAddress.reachabilityCheckTimeoutMs set to 1000
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2549] - QuorumPeer communication is not secured! (SASL auth disabled)
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.q.QuorumPeer@2574] - quorum.cnxn.threads.size set to 20
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.704b3ffe1. len = 117453718, byte = 0
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7048b88ec. len = 117453718, byte = 0
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7045eee09. len = 117453718, byte = 0
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.704353376. len = 117453718, byte = 0
2023-09-25 09:20:53,249 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.7041b5a1c. len = 117453718, byte = 0
2023-09-25 09:20:53,250 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.703ff2022. len = 117453718, byte = 0
2023-09-25 09:20:53,250 [myid:1] - INFO  [main:o.a.z.s.p.SnapStream@321] - Invalid snapshot snapshot.703db7708. len = 117453718, byte = 0
2023-09-25 09:20:53,250 [myid:1] - INFO  [main:o.a.z.s.p.FileSnap@85] - Reading snapshot /var/lib/zookeeper/data/version-2/snapshot.703ba118a
2023-09-25 09:21:06,868 [myid:1] - INFO  [main:o.a.z.s.DataTree@1705] - The digest in the snapshot has digest version of 2, with zxid as 0x703ba3af0, and digest value as 14229919301032116
2023-09-25 09:24:24,679 [myid:1] - INFO  [main:o.a.z.a.ZKAuditProvider@42] - ZooKeeper audit is disabled

Of pod 2 :

2023-09-25 09:25:32,083 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:25:32,084 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:25:32,084 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:25:32,084 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:25:32,473 [myid:] - INFO  [NIOWorkerThread-11:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57406
2023-09-25 09:25:33,478 [myid:] - INFO  [NIOWorkerThread-13:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57418
2023-09-25 09:25:34,011 [myid:] - INFO  [NIOWorkerThread-15:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:57432
2023-09-25 09:25:34,482 [myid:] - INFO  [NIOWorkerThread-9:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57448
2023-09-25 09:25:35,485 [myid:] - INFO  [NIOWorkerThread-14:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57460
2023-09-25 09:25:36,489 [myid:] - INFO  [NIOWorkerThread-16:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57470
2023-09-25 09:25:37,494 [myid:] - INFO  [NIOWorkerThread-3:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:57474
2023-09-25 09:25:38,499 [myid:] - INFO  [NIOWorkerThread-2:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39030
2023-09-25 09:25:39,503 [myid:] - INFO  [NIOWorkerThread-5:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39044
2023-09-25 09:25:40,508 [myid:] - INFO  [NIOWorkerThread-4:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39054
2023-09-25 09:25:41,513 [myid:] - INFO  [NIOWorkerThread-8:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39066
2023-09-25 09:25:42,095 [myid:] - INFO  [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@996] - Notification time out: 10000 ms
2023-09-25 09:25:42,096 [myid:] - WARN  [QuorumConnectionThread-[myid=2]-8959:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 1 at election address clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local/10.233.77.184:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2023-09-25 09:25:42,104 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:25:42,104 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:25:42,517 [myid:] - INFO  [NIOWorkerThread-7:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39074
2023-09-25 09:25:43,522 [myid:] - INFO  [NIOWorkerThread-10:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39080
2023-09-25 09:25:44,013 [myid:] - INFO  [NIOWorkerThread-1:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:39084
2023-09-25 09:25:44,525 [myid:] - INFO  [NIOWorkerThread-6:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39088
2023-09-25 09:25:45,528 [myid:] - INFO  [NIOWorkerThread-12:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39096
2023-09-25 09:25:46,530 [myid:] - INFO  [NIOWorkerThread-11:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39102
2023-09-25 09:25:47,533 [myid:] - INFO  [NIOWorkerThread-13:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:39112
2023-09-25 09:25:48,536 [myid:] - INFO  [NIOWorkerThread-15:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46780
2023-09-25 09:25:49,540 [myid:] - INFO  [NIOWorkerThread-9:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46782
2023-09-25 09:25:50,544 [myid:] - INFO  [NIOWorkerThread-14:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46798
2023-09-25 09:25:51,549 [myid:] - INFO  [NIOWorkerThread-16:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46800
2023-09-25 09:25:52,552 [myid:] - INFO  [NIOWorkerThread-3:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46804
2023-09-25 09:25:53,556 [myid:] - INFO  [NIOWorkerThread-2:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46814
2023-09-25 09:25:54,023 [myid:] - INFO  [NIOWorkerThread-5:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:46820
2023-09-25 09:25:54,138 [myid:] - INFO  [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@996] - Notification time out: 10000 ms
2023-09-25 09:25:54,138 [myid:] - WARN  [QuorumConnectionThread-[myid=2]-8959:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 1 at election address clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local/10.233.77.184:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2023-09-25 09:25:54,144 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:25:54,144 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:25:54,144 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:25:54,144 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:25:54,559 [myid:] - INFO  [NIOWorkerThread-4:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46832
2023-09-25 09:25:55,564 [myid:] - INFO  [NIOWorkerThread-8:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46846
2023-09-25 09:25:56,566 [myid:] - INFO  [NIOWorkerThread-7:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46860
2023-09-25 09:25:57,569 [myid:] - INFO  [NIOWorkerThread-10:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:46870
2023-09-25 09:25:58,573 [myid:] - INFO  [NIOWorkerThread-1:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47248
2023-09-25 09:25:59,578 [myid:] - INFO  [NIOWorkerThread-6:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47262
2023-09-25 09:26:00,582 [myid:] - INFO  [NIOWorkerThread-12:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47264
2023-09-25 09:26:01,584 [myid:] - INFO  [NIOWorkerThread-11:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47274
2023-09-25 09:26:02,587 [myid:] - INFO  [NIOWorkerThread-13:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47288
2023-09-25 09:26:03,589 [myid:] - INFO  [NIOWorkerThread-15:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47290
2023-09-25 09:26:04,022 [myid:] - INFO  [NIOWorkerThread-9:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:47298
2023-09-25 09:26:04,144 [myid:] - INFO  [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@996] - Notification time out: 10000 ms
2023-09-25 09:26:04,151 [myid:] - WARN  [QuorumConnectionThread-[myid=2]-8960:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 1 at election address clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local/10.233.77.184:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2023-09-25 09:26:04,155 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:26:04,155 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:26:04,155 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:26:04,155 [myid:] - INFO  [WorkerReceiver[myid=2]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:26:04,592 [myid:] - INFO  [NIOWorkerThread-14:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47310
2023-09-25 09:26:05,595 [myid:] - INFO  [NIOWorkerThread-16:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47318
2023-09-25 09:26:06,598 [myid:] - INFO  [NIOWorkerThread-3:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47330
2023-09-25 09:26:07,602 [myid:] - INFO  [NIOWorkerThread-2:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:47338
2023-09-25 09:26:08,606 [myid:] - INFO  [NIOWorkerThread-5:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56558
2023-09-25 09:26:09,610 [myid:] - INFO  [NIOWorkerThread-4:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56564

And pod 3 :

2023-09-25 09:27:56,186 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:27:56,186 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:27:56,187 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:27:56,254 [myid:] - INFO  [NIOWorkerThread-14:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44016
2023-09-25 09:27:57,257 [myid:] - INFO  [NIOWorkerThread-1:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44028
2023-09-25 09:27:58,262 [myid:] - INFO  [NIOWorkerThread-2:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44034
2023-09-25 09:27:59,266 [myid:] - INFO  [NIOWorkerThread-12:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44036
2023-09-25 09:28:00,272 [myid:] - INFO  [NIOWorkerThread-15:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44050
2023-09-25 09:28:01,274 [myid:] - INFO  [NIOWorkerThread-8:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44052
2023-09-25 09:28:02,278 [myid:] - INFO  [NIOWorkerThread-4:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44064
2023-09-25 09:28:03,283 [myid:] - INFO  [NIOWorkerThread-16:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44078
2023-09-25 09:28:04,029 [myid:] - INFO  [NIOWorkerThread-9:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:44086
2023-09-25 09:28:04,287 [myid:] - INFO  [NIOWorkerThread-5:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:44102
2023-09-25 09:28:05,291 [myid:] - INFO  [NIOWorkerThread-13:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56002
2023-09-25 09:28:06,197 [myid:] - INFO  [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@996] - Notification time out: 10000 ms
2023-09-25 09:28:06,197 [myid:] - WARN  [QuorumConnectionThread-[myid=3]-8979:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 1 at election address clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local/10.233.77.184:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2023-09-25 09:28:06,206 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:28:06,206 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:28:06,206 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:28:06,206 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:28:06,296 [myid:] - INFO  [NIOWorkerThread-11:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56010
2023-09-25 09:28:07,301 [myid:] - INFO  [NIOWorkerThread-7:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56016
2023-09-25 09:28:08,305 [myid:] - INFO  [NIOWorkerThread-10:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56018
2023-09-25 09:28:09,310 [myid:] - INFO  [NIOWorkerThread-6:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56030
2023-09-25 09:28:10,315 [myid:] - INFO  [NIOWorkerThread-3:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56040
2023-09-25 09:28:11,319 [myid:] - INFO  [NIOWorkerThread-14:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56056
2023-09-25 09:28:12,322 [myid:] - INFO  [NIOWorkerThread-1:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56070
2023-09-25 09:28:13,326 [myid:] - INFO  [NIOWorkerThread-2:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56072
2023-09-25 09:28:14,039 [myid:] - INFO  [NIOWorkerThread-12:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:56086
2023-09-25 09:28:14,330 [myid:] - INFO  [NIOWorkerThread-15:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:56100
2023-09-25 09:28:15,333 [myid:] - INFO  [NIOWorkerThread-8:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35146
2023-09-25 09:28:16,395 [myid:] - INFO  [NIOWorkerThread-4:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35158
2023-09-25 09:28:17,398 [myid:] - INFO  [NIOWorkerThread-16:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35164
2023-09-25 09:28:18,237 [myid:] - INFO  [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):o.a.z.s.q.FastLeaderElection@996] - Notification time out: 10000 ms
2023-09-25 09:28:18,237 [myid:] - WARN  [QuorumConnectionThread-[myid=3]-8980:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 1 at election address clickhouse-zookeeper-0.clickhouse-zookeepers.logs.svc.cluster.local/10.233.77.184:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
        at java.base/java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2023-09-25 09:28:18,246 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:28:18,247 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:2, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:28:18,247 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.f.QuorumHierarchical@140] - 3, 3, 1
2023-09-25 09:28:18,247 [myid:] - INFO  [WorkerReceiver[myid=3]:o.a.z.s.q.FastLeaderElection$Messenger$WorkerReceiver@391] - Notification: my state:LOOKING; n.sid:3, n.state:LOOKING, n.leader:3, n.round:0x2, n.peerEpoch:0x7, n.zxid:0x704c0d7c0, message format version:0x2, n.config version:0x0
2023-09-25 09:28:18,402 [myid:] - INFO  [NIOWorkerThread-9:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35166
2023-09-25 09:28:19,407 [myid:] - INFO  [NIOWorkerThread-5:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35176
2023-09-25 09:28:20,411 [myid:] - INFO  [NIOWorkerThread-13:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35184
2023-09-25 09:28:21,417 [myid:] - INFO  [NIOWorkerThread-11:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35194
2023-09-25 09:28:22,421 [myid:] - INFO  [NIOWorkerThread-7:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35204
2023-09-25 09:28:23,426 [myid:] - INFO  [NIOWorkerThread-10:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35212
2023-09-25 09:28:24,047 [myid:] - INFO  [NIOWorkerThread-6:o.a.z.s.NIOServerCnxn@522] - Processing ruok command from /127.0.0.1:35216
2023-09-25 09:28:24,429 [myid:] - INFO  [NIOWorkerThread-3:o.a.z.s.NIOServerCnxn@522] - Processing mntr command from /127.0.0.1:35230

jdalia001 avatar Sep 25 '23 09:09 jdalia001

clickhouse-keeper-0 have failed liveness probe and restart during loading snapshots

try to increase initialDelaySeconds: 180 in statefulset spec.template.spec.containers.livenessProbe

Slach avatar Sep 27 '23 06:09 Slach