kafka-ui
kafka-ui copied to clipboard
TimeoutException when JMX is enabled
Hello,
i am trying to setup kafka-ui and i receive this message:
org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1646050105263, tries=1, nextAllowedTryMs=1646050749797) timed out at 1646050749697 after 1 attempt(s) Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
the cluster report "offline" it seems that this happen only if i enable jmx, without jmx the system work correctly.
jmx it's configured correctly on the host (checked with jmxterm) and there isn't network issue from kafkaui --> broker (checked again with jmxterm)
let me know if i can give you more details or you can point me into the right direction
Hello there ramarro123! 👋
Thank you and congratulations 🎉 for opening your very first issue in this project! 💖
In case you want to claim this issue, please comment down below! We will try to get back to you as soon as we can. 👀
Hi, is there SSL enabled for JMX by any chance?
nope, no ssl enabled, and same on the config of kakfa-ui
broker config -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=172.17.252.73 -Dcom.sun.management.jmxremote.rmi.port=5701 -Dcom.sun.management.jmxremote.port=5701
netstat tcp 0 0 0.0.0.0:5701 0.0.0.0:* LISTEN
hello, we have same problem. With jConsole i can connect without problem to broker - it also says that connection is insecure so there is no chance that ssl is active.
@ramarro123 @moravcik94 Could you please share your kafka-ui env variables/configmap/etc?
Hello, I have the same problem and eventualy disabled the JMX port. In my case, we have 5 clusters (5 environement) all configured alike (no ssl). For one of them, the JMX actually "works" ... from time to time ... it is extremely slow and often timeout. Maybe unrelated but the Oracle Java Mission Control Tool is also horribly slow to connect to the same JMX, while jconsole is rather quick.
let me attach the config, cluster 0 it's the one that fail due to jmx enabled in config
version: '2'
services:
kafka-ui:
image: provectuslabs/kafka-ui
container_name: kafka-ui
ports:
- "8083:8080"
restart: always
environment:
- TZ=Europe/Rome
- KAFKA_CLUSTERS_0_NAME=fabrick_kafka_tst
- KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=uxtequeue01.sg.gbs.tst:9092,uxtequeue02.sg.gbs.tst:9092,uxtequeue03.sg.gbs.tst:9092
- KAFKA_CLUSTERS_0_ZOOKEEPER=uxtequeue01.sg.gbs.tst:2181,uxtequeue02.sg.gbs.tst:2181,uxtequeue03.sg.gbs.tst:2181
- KAFKA_CLUSTERS_0_JMXPORT=5701
- KAFKA_CLUSTERS_0_JMXSSL=false
- KAFKA_CLUSTERS_1_NAME=fabrick_kafka_pre
- KAFKA_CLUSTERS_1_BOOTSTRAPSERVERS=uxppqueue01.sg.gbs.pre:9092,uxppqueue02.sg.gbs.pre:9092,uxppqueue03.sg.gbs.pre:9092
- KAFKA_CLUSTERS_1_ZOOKEEPER=uxppsndzookeep1.sg.gbs.pre:2181,uxppsndzookeep2.sg.gbs.pre:2181,uxppsndzookeep3.sg.gbs.pre:2181
# - KAFKA_CLUSTERS_1_JMXPORT=5701
# - KAFKA_CLUSTERS_1_JMXSSL=false
- JAVA_OPTS=-Xmx4096M -Xms1024M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djute.maxbuffer=5242880 -DLOG_PATH=/opt/kafkaui/logs/ -Dlogging.config=file:/opt/kafkaui/config/logback-kafkaui.xml
volumes:
- type: bind
source: /opt/kafkaui/logs
target: /opt/kafkaui/logs
- type: bind
source: /opt/kafkaui/config
target: /opt/kafkaui/config
- /etc/localtime:/etc/localtime:ro
@ramarro123 It is pretty strange, because TimeoutException you provided is not connected with jmx. I can recommend to
- set
KAFKA_ADMIN-CLIENT-TIMEOUT=60000
(env variable) to exclude possibility of kafka api timeouts - add
-Dsun.rmi.transport.tcp.responseTimeout=60000
to JAVA_OPTS.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
[root@fbkppawx /opt/kafkaui]$ less logs/kafkaui.log
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:217)
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:162)
at com.sun.proxy.$Proxy115.newClient(Unknown Source)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2105)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:321)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at com.provectus.kafka.ui.util.JmxPoolFactory.create(JmxPoolFactory.java:31)
at com.provectus.kafka.ui.util.JmxPoolFactory.create(JmxPoolFactory.java:17)
at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1012)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277)
at com.provectus.kafka.ui.util.JmxClusterUtil.getJmxMetrics(JmxClusterUtil.java:99)
at com.provectus.kafka.ui.util.JmxClusterUtil.lambda$getJmxMetric$3(JmxClusterUtil.java:82)
at java.base/java.util.Optional.map(Optional.java:258)
at com.provectus.kafka.ui.util.JmxClusterUtil.getJmxMetric(JmxClusterUtil.java:82)
at com.provectus.kafka.ui.util.JmxClusterUtil.lambda$getBrokerMetrics$0(JmxClusterUtil.java:73)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)
at reactor.core.publisher.FluxIterable$IterableSubscription.fastPath(FluxIterable.java:340)
at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:227)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:169)
at reactor.core.publisher.MonoCollect$CollectSubscriber.onSubscribe(MonoCollect.java:103)
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:165)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:87)
at reactor.core.publisher.Mono.subscribe(Mono.java:4399)
at reactor.core.publisher.MonoZip.subscribe(MonoZip.java:128)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.success(MonoCreate.java:165)
at com.provectus.kafka.ui.service.ReactiveAdminClient.lambda$describeCluster$18(ReactiveAdminClient.java:224)
at org.apache.kafka.common.internals.KafkaFutureImpl$WhenCompleteBiConsumer.accept(KafkaFutureImpl.java:177)
at org.apache.kafka.common.internals.KafkaFutureImpl$WhenCompleteBiConsumer.accept(KafkaFutureImpl.java:162)
at org.apache.kafka.common.internals.KafkaFutureImpl.complete(KafkaFutureImpl.java:221)
at org.apache.kafka.common.KafkaFuture$AllOfAdapter.maybeComplete(KafkaFuture.java:82)
at org.apache.kafka.common.KafkaFuture$AllOfAdapter.accept(KafkaFuture.java:76)
at org.apache.kafka.common.KafkaFuture$AllOfAdapter.accept(KafkaFuture.java:57)
at org.apache.kafka.common.internals.KafkaFutureImpl.complete(KafkaFutureImpl.java:221)
at org.apache.kafka.clients.admin.KafkaAdminClient$5.handleResponse(KafkaAdminClient.java:1882)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.handleResponses(KafkaAdminClient.java:1189)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1341)
at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1264)
at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.net.ConnectException: Operation timed out
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:493)
at java.base/sun.nio.ch.Net.connect(Net.java:482)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:339)
at java.base/java.net.Socket.connect(Socket.java:603)
at java.base/java.net.Socket.connect(Socket.java:552)
at java.base/java.net.Socket.<init>(Socket.java:475)
at java.base/java.net.Socket.<init>(Socket.java:249)
at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:40)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:617)
... 45 common frames omitted
version: '2'
services:
kafka-ui:
image: provectuslabs/kafka-ui
container_name: kafka-ui
ports:
- "8083:8080"
restart: always
environment:
- TZ=Europe/Rome
- KAFKA_ADMIN-CLIENT-TIMEOUT=60000
- KAFKA_CLUSTERS_0_NAME=fabrick_kafka_tst
- KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=uxtequeue01.sg.gbs.tst:9092,uxtequeue02.sg.gbs.tst:9092,uxtequeue03.sg.gbs.tst:9092
- KAFKA_CLUSTERS_0_ZOOKEEPER=uxtequeue01.sg.gbs.tst:2181,uxtequeue02.sg.gbs.tst:2181,uxtequeue03.sg.gbs.tst:2181
- KAFKA_CLUSTERS_0_JMXPORT=5701
- KAFKA_CLUSTERS_0_JMXSSL=false
- KAFKA_CLUSTERS_1_NAME=fabrick_kafka_pre
- KAFKA_CLUSTERS_1_BOOTSTRAPSERVERS=uxppqueue01.sg.gbs.pre:9092,uxppqueue02.sg.gbs.pre:9092,uxppqueue03.sg.gbs.pre:9092
- KAFKA_CLUSTERS_1_ZOOKEEPER=uxppsndzookeep1.sg.gbs.pre:2181,uxppsndzookeep2.sg.gbs.pre:2181,uxppsndzookeep3.sg.gbs.pre:2181
# - KAFKA_CLUSTERS_1_JMXPORT=5701
# - KAFKA_CLUSTERS_1_JMXSSL=false
- JAVA_OPTS=-Xmx4096M -Xms1024M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djute.maxbuffer=5242880 -Dsun.rmi.transport.tcp.responseTimeout=60000 -DLOG_PATH=/opt/kafkaui/logs/ -Dlogging.config=file:/opt/kafkaui/config/logback-kafkaui.xml
volumes:
- type: bind
source: /opt/kafkaui/logs
target: /opt/kafkaui/logs
- type: bind
source: /opt/kafkaui/config
target: /opt/kafkaui/config
- /etc/localtime:/etc/localtime:ro
@ramarro123 @moravcik94 @ngmip which kafka do you folks use? which vendor and version?
@Haarolean Kafka open source v2.4
Hi @Haarolean,
I'm a colleague of @ramarro123 and we are using Kafka open source v2.12-2.3.0 at the moment.
@ngmip @MgFbk thanks for the feedback!
The issue is confirmed and we're working on the fix. Thanks for your reports.
@ramarro123 @moravcik94 @ngmip @MgFbk the fix is available in master (and master-labeled docker image as well). Please let us know how it goes.
Hi @Haarolean,
thanks for update. I've tested just now your new version: provectuslabs/kafka-ui:master
Is it right?
Unfortunately, I'm getting same errors explained in previous comments.
`2022-04-20 11:29:38.185 ERROR 1 --- [boundedElastic-4] c.p.kafka.ui.util.JmxClusterUtil : Cannot get JMX connector for the pool due to:
java.rmi.ConnectException: Connection refused to host: 172.17.252.73; nested exception is:
java.net.ConnectException: Operation timed out
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:623)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:209)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:196)
at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:132)
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:217)
at java.rmi/java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:162)
at com.sun.proxy.$Proxy113.newClient(Unknown Source)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2105)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:321)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at com.provectus.kafka.ui.util.JmxPoolFactory.create(JmxPoolFactory.java:31)
at com.provectus.kafka.ui.util.JmxPoolFactory.create(JmxPoolFactory.java:17)
at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1012)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:277)
at com.provectus.kafka.ui.util.JmxClusterUtil.getJmxMetrics(JmxClusterUtil.java:104)
at com.provectus.kafka.ui.util.JmxClusterUtil.lambda$getJmxMetric$3(JmxClusterUtil.java:87)
at java.base/java.util.Optional.map(Optional.java:258)
at com.provectus.kafka.ui.util.JmxClusterUtil.getJmxMetric(JmxClusterUtil.java:87)
at com.provectus.kafka.ui.util.JmxClusterUtil.lambda$getBrokerMetrics$0(JmxClusterUtil.java:77)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.net.ConnectException: Operation timed out
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:493)
at java.base/sun.nio.ch.Net.connect(Net.java:482)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:339)
at java.base/java.net.Socket.connect(Socket.java:603)
at java.base/java.net.Socket.connect(Socket.java:552)
at java.base/java.net.Socket.
Stack trace is repeated for each cluster node ip, where I'm trying to use JMX port.
No config changes on kafka node config. Same as reported by @ramarro123:
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=172.17.252.73 -Dcom.sun.management.jmxremote.rmi.port=5701 -Dcom.sun.management.jmxremote.port=5701
@Haarolean
I did not get a timeout but with the JMX port set, it needs a huge time to first get the information, nearly a timeout I think.
@MgFbk seems to be a different issue.
@ngmip does it affect UI?
@MgFbk does it work via any other tool like jconsole? It seems like a network issue of a sort
@Haarolean somewhat yes, for 4 minute after kafka-ui started the cluster is marked as "down", the topic list is empty, the consumers list needs like 10 sec to load (while it usually only needs one or two) and 'Schema Registry' and 'Kafka Connect' menu entries are missing.
Then after what I assumes the successful yet slow JMX query the UI becomes OK.
The logs are quite showing this slowness :
2022-04-21 08:47:44,180 INFO [main] o.s.b.w.e.n.NettyWebServer: Netty started on port 8080
2022-04-21 08:47:44,197 INFO [main] c.p.k.u.KafkaUiApplication: Started KafkaUiApplication in 3.419 seconds (JVM running for 4.415)
2022-04-21 08:47:44,211 DEBUG [parallel-1] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
2022-04-21 08:47:44,222 INFO [parallel-1] o.a.k.c.a.AdminClientConfig: AdminClientConfig values:
... kafka config omitted for brevity ...
2022-04-21 08:47:44,258 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka version: 2.8.0
2022-04-21 08:47:44,258 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka commitId: ebb1d6e21cc92130
2022-04-21 08:47:44,258 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka startTimeMs: 1650523664257
2022-04-21 08:51:41,643 DEBUG [boundedElastic-1] c.p.k.u.s.ClustersMetricsScheduler: Metrics updated for cluster: uat
2022-04-21 08:51:41,644 DEBUG [parallel-8] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
2022-04-21 08:55:39,120 DEBUG [boundedElastic-1] c.p.k.u.s.ClustersMetricsScheduler: Metrics updated for cluster: uat
2022-04-21 08:55:39,120 DEBUG [parallel-11] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
=> nearly 4 minutes to "getting metrics"
For the record, the same logs with jmx-port = 0 (disabled JMX)
2022-04-21 09:02:48,640 INFO [main] o.s.b.w.e.n.NettyWebServer: Netty started on port 8080
2022-04-21 09:02:48,657 INFO [main] c.p.k.u.KafkaUiApplication: Started KafkaUiApplication in 3.148 seconds (JVM running for 4.031)
2022-04-21 09:02:48,673 DEBUG [parallel-1] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
2022-04-21 09:02:48,685 INFO [parallel-1] o.a.k.c.a.AdminClientConfig: AdminClientConfig values:
... kafka config omitted for brevity ...
2022-04-21 09:02:48,730 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka version: 2.8.0
2022-04-21 09:02:48,730 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka commitId: ebb1d6e21cc92130
2022-04-21 09:02:48,730 INFO [parallel-1] o.a.k.c.u.AppInfoParser: Kafka startTimeMs: 1650524568729
2022-04-21 09:02:49,902 DEBUG [parallel-8] c.p.k.u.s.ClustersMetricsScheduler: Metrics updated for cluster: uat
2022-04-21 09:03:18,662 DEBUG [parallel-9] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
2022-04-21 09:03:19,159 DEBUG [parallel-2] c.p.k.u.s.ClustersMetricsScheduler: Metrics updated for cluster: uat
2022-04-21 09:03:48,666 DEBUG [parallel-3] c.p.k.u.s.ClustersMetricsScheduler: Start getting metrics for kafkaCluster: uat
2022-04-21 09:03:49,121 DEBUG [parallel-8] c.p.k.u.s.ClustersMetricsScheduler: Metrics updated for cluster: uat
Hi @Haarolean,
thanks to comment
nearly 4 minutes to "getting metrics"
of @ngmip we have solved.
This morning, I've checked again KAFKAUI status for our TST CLUSTER, configured to use JMX_PORT. It is running well and yesterday KO status was just temporary, time waited wasn't enough.
"Connection time out" exceptions were still there but just for some nodes of our cluster. More deeply we have missing:
-Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT
on some nodes.
Restarted those nodes, everything is OK using JMX_PORT.
Thanks to all.
Great work.
so, I think this thread can be closed and resolved.
The only problem we currently have is that cluster stats collection and jmx metrics gathering are going one after another and UI is not initialized before jmx metrics collected (see https://github.com/provectus/kafka-ui/issues/1689#issuecomment-1104792531).
To solve this we will not require jmx metrics to be collected before UI initialization and will do jmx collection in parallel with required cluster stats pulling. This will be implemented under https://github.com/provectus/kafka-ui/pull/2190 initiative