cp-docker-images icon indicating copy to clipboard operation
cp-docker-images copied to clipboard

Controller to Broker - Socket Timeout Exception after restarting the VM - Urgent

Open jvarma2306 opened this issue 6 years ago • 2 comments

Hi All,

We are using confluentinc/cp-kafka:3.2.1 for one of out enterprise product. Below is the Broker Configuration.

kafka-1: deploy: placement: constraints: - node.role == manager restart_policy: condition: on-failure delay: 40s max_attempts: 10 window: 120s image: $MASTER_REGISTRY_IP:$MASTER_REGISTRY_PORT/cp-kafka:$BUILD_LABEL ports: - '19092:19092' depends_on: - zookeeper-1 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 'zookeeper-1:12181' KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://$KAFKA_ADVERTISED_ADDRESS:19092' KAFKA_ADVERTISED_HOST_NAME: '$KAFKA_ADVERTISED_ADDRESS' KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 15000 KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS: 15000 KAFKA_num.partitions: 16 KAFKA_log.retention.hours: 2 KAFKA_LOG4J_LOGGERS: "kafka.controller=WARN,state.change.logger=INFO" KAFKA_LOG4J_ROOT_LOGLEVEL: WARN KAFKA_TOOLS_LOG4J_LOGLEVEL: ERROR KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

We are running this container on RHEL 7.4 version docker. When we start the docker and KAFKA, every thing works fine.

We have a usecase where we want to verify the recovery of KAFKA one VM / HOST reboot or power cycle. This case is failing. When we reboot the Host / VM, KAFKA on start is throwing this ERROR continuously.

[2018-12-18 17:14:29,589] WARN [Controller-1-to-broker-1-send-thread], Controller 1's connection to broker <IP Address>:19092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.net.SocketTimeoutException: Failed to connect within 30000 ms at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:237) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:189) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:188) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

If we un-install and re-install, it works fine. But again reboot of VM this throws this error.

We need the experts help here. Thanks in Advance.

jvarma2306 avatar Dec 19 '18 19:12 jvarma2306

We have a usecase where we want to verify the recovery of KAFKA one VM / HOST reboot or power cycle

In order to do this, you need more than one Zookeeper and Kafka broker on different machines that will not be shutdown, and you need to have more than one replica for all your topics see REPLICATION_FACTOR: 1 configuration

OneCricketeer avatar Jan 02 '19 19:01 OneCricketeer

Is there a solution?

cobolbaby avatar Nov 05 '19 06:11 cobolbaby