confluent-cli icon indicating copy to clipboard operation
confluent-cli copied to clipboard

Confluent CLI says stack is down, even if it's not

Open rmoff opened this issue 8 years ago • 11 comments

Robin@asgard02 ~> confluent status
connect is [DOWN]
kafka-rest is [DOWN]
schema-registry is [DOWN]
kafka is [DOWN]
zookeeper is [DOWN]

But it's clearly running:

Robin@asgard02 ~> ps -ef|grep confluent
  502  3300     1   0 Wed02pm ??         3:13.19 /usr/bin/java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.zookeeper.server.quorum.QuorumPeerMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/zookeeper/zookeeper.properties
  502  3463     1   0 Wed02pm ??         4:33.09 /usr/bin/java -Xmx512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dschema-registry.log.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/schema-registry/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../package-schema-registry/target/kafka-schema-registry-package-*-development/share/java/schema-registry/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/rest-utils/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/schema-registry/* io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/schema-registry/schema-registry.properties
  502  3700     1   0 Wed02pm ??         2:39.24 /usr/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka-rest/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../target/kafka-rest-*-development/share/java/kafka-rest/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/rest-utils/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka-rest/* io.confluent.kafkarest.KafkaRestMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/kafka-rest/kafka-rest.properties
  502  5926     1   0 Wed03pm ??       135:50.08 /usr/bin/java -Xmx256M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/connect-log4j.properties -cp /Users/Robin/cp/confluent-3.3.0/share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/share/java/confluent-common/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-serde-tools/*:/Users/Robin/cp/confluent-3.3.0/share/java/monitoring-interceptors/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-elasticsearch/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-hdfs/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-irc/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-jdbc/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-replicator/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-s3/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-storage-common/*:/Users/Robin/cp/confluent-3.3.0/share/java/kafka-connect-twitter/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.kafka.connect.cli.ConnectDistributed /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/connect/connect.properties
  502 52893     1   0 Fri05pm ??        57:38.69 /usr/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* io.confluent.support.metrics.SupportedKafka /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/kafka/kafka.properties
  502 63772 63522   0 10:58pm ttys000    0:00.00 grep --color=auto confluent

This was after numerous days suspending/unsuspending my laptop, having previously started the stack up.

This issue causes two problems:

  1. Can't use the CLI to shutdown the running components
  2. Can't use the CLI to start up the stack, because it's running, and you get port clashes:
Robin@asgard02 ~> ps -ef|grep confluent
  502  3300     1   0 Wed02pm ??         3:13.45 /usr/bin/java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/Users/Robin/cp/confluent-3.3.0/bin/../logs -Dlog4j.configuration=file:/Users/Robin/cp/confluent-3.3.0/bin/../etc/kafka/log4j.properties -cp :/Users/Robin/cp/confluent-3.3.0/bin/../share/java/kafka/*:/Users/Robin/cp/confluent-3.3.0/bin/../share/java/confluent-support-metrics/*:/usr/share/java/confluent-support-metrics/* org.apache.zookeeper.server.quorum.QuorumPeerMain /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.yAzjsc10/zookeeper/zookeeper.properties
  502 64006 63522   0 11:02pm ttys000    0:00.00 grep --color=auto confluent
Robin@asgard02 ~> confluent status
connect is [DOWN]
kafka-rest is [DOWN]
schema-registry is [DOWN]
kafka is [DOWN]
zookeeper is [DOWN]
Robin@asgard02 ~> confluent start
Starting zookeeper
Zookeeper failed to start
zookeeper is [DOWN]
Cannot start Kafka, Zookeeper is not running. Check your deployment

confluent log zookeeper shows:

[2017-10-03 23:02:27,339] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2017-10-03 23:02:27,340] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
java.net.BindException: Address already in use

I don't quite know how my setup got into the state it did, but the CLI needs to improve how it detects if processes are running or not.

rmoff avatar Oct 04 '17 06:10 rmoff

We kind of ran into the same issue before and the only way we recovered was manually kill the processes listed in ps -ef command and start the stack again.

prasanna-sk avatar Nov 14 '17 16:11 prasanna-sk

When you are running confluent status does confluent current or echo $CONFLUENT_CURRENT (if it's set) point to the runtime directory of the deployment that is currently running?

If you've set CONFLUENT_CURRENT but you attempted to run confluent status from a terminal that doesn't have this env var set, the CLI doesn't have a way to find the descriptors for the currently running services. You might want to use lsof to figure out what that directory of the running services.

kkonstantine avatar Nov 14 '17 16:11 kkonstantine

In my case - CONFLUENT_CURRENT is not set. But, from this link, if it is not set, it defaults to /tmp

confluent current - does show the runtime dir from /tmp

prasanna-sk avatar Nov 14 '17 17:11 prasanna-sk

Here is an observation/issue we are facing.

root user -- confluent start . (successful) root user -- confluent status . (shows all services are UP) root user -- confluent current (shows /tmp/confluent.######)

non-root user log into the same server while the services are up and running.

non-root user -- confluent status (shows all services are DOWN) non-root user -- sudo confluent status (shows all services are UP) non-root user -- confluent current (shows same /tmp/confluent.###### as above).

What I did notice is by default - /tmp/confluent.###### has rwx------ permission for root (or any user that starts the service). So, no other users are unable to read that dir or files in it. confluent.current also has rwx------ permission - again owned and accessible only owner (in this case root).

Note: I did yum install confluent package as root. Not sure if that has any implication.

prasanna-sk avatar Nov 15 '17 22:11 prasanna-sk

I am also facing the same issue with non root user but its fine for root user.

ganu453 avatar Dec 01 '17 07:12 ganu453

I also faced the same issue, which means zookeeper is running from init.d so just sudo service zookeeper stop , try it , if it works then its relaxing.

sankalp58 avatar Jan 08 '18 20:01 sankalp58

Hitting this issue again. Seems to be different terminal sessions end up with different CONFLUENT_CURRENT values, all based on permutations of /var/folders/q9/2tg_lt9j6nx29rvr5r5jn_bw0000gp/T/confluent.xxxxxxx

I'm definitely not doing anything to set CONFLUENT_CURRENT myself.

Having to wheel out this rather nasty way of killing things:

ps -ef|grep confluent.|grep -v grep|awk '{print $2}'|xargs -Ifoo kill -9 foo

rmoff avatar May 01 '18 14:05 rmoff

I have the same problem. 'confluent status' return [DOWN], 'confluent stop', 'confluent log' doesn't work...

I just found that there are 2 confluent current running folders under /tmp. I checked that one of the folder is empty and one of them contains files of the current running Confluent instance. When I do a 'confluent current', it returns the name of the empty folder!!! I noticed that the file /tmp/confluent.current has something to do with the confluent cli. I updated the file to match with the current running kafka instance and 'confluent log kafka' now works again. But, confluent status still doesn't work...

ngwwm avatar Aug 10 '18 23:08 ngwwm

To workaround the issue, always run the confluent cli from /tmp (or $CONFLUENT_CURRENT if defined) Or update bin/confluent as below

... [[ $# -lt 1 ]] && usage

requirements

cd $confluent_current_dir command="${1}" ...

I am using confluent 4.

ngwwm avatar Aug 11 '18 03:08 ngwwm

I encountered this issue, I tried following. It works! I am using confluent oss 5.0.0 Problem: user@user-Lenovo-G400:~$ confluent start This CLI is intended for development only, not for production https://docs.confluent.io/current/cli/index.html Using CONFLUENT_CURRENT: /home/user/confluent-5.0.0/confluent.0C1Oma4q Starting zookeeper Zookeeper failed to start zookeeper is [DOWN] Cannot start Kafka, Zookeeper is not running. Check your deployment

Solution: user@user-Lenovo-G400:~ sudo /home/user/confluent-5.0.0/bin/zookeeper-server-stop

user@user-Lenovo-G400:~$ confluent start This CLI is intended for development only, not for production https://docs.confluent.io/current/cli/index.html

Using CONFLUENT_CURRENT: /home/user/confluent-5.0.0/confluent.0C1Oma4q Starting zookeeper zookeeper is [UP] Starting kafka kafka is [UP] Starting schema-registry schema-registry is [UP] Starting kafka-rest kafka-rest is [UP] Starting connect connect is [UP] Starting ksql-server ksql-server is [UP] user@user-Lenovo-G400:~$

May be someone may find it useful!

gopinathankm avatar Sep 30 '18 06:09 gopinathankm

Seems this issue is still there. confluent status does not seem to work.

alokpaul avatar Jul 23 '19 14:07 alokpaul