cruise-control icon indicating copy to clipboard operation
cruise-control copied to clipboard

Cruise Control 2.5.122 and MSK Kafka 2.8.2.tiered not getting data on Cluster Load

Open mantoanipythian opened this issue 2 years ago • 1 comments

Cruise Control is starting fine and I am able to see Kafka Cluster State but I am unable to use Kafka Cluster Load with error below I am using Prometheus to scrape data and I can see data on Prometheus

Another thing it came to my attention is that my __CruiseControlMetrics is not receiving any data when I do a consumer on it from beggin

##Error showed on UI

ERROR: Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839397691] (index [1, -1]). Window index (current: 0, oldest: 0).'.

On logs I can see this following message:

##Error from Logs

[2023-05-23 07:53:43,990] INFO Processing async request ClusterLoadRequest. (com.linkedin.kafka.cruisecontrol.servlet.handler.async.AbstractAsyncRequest) [2023-05-23 07:53:43,994] INFO Create a new UserTask 5a9abba2-f714-44c8-8117-ad708c9604f1 with SessionKey SessionKey{_session=com.linkedin.kafka.cruisecontrol.servlet.ServletSession@2208dbaf,_requestUrl=GET /kafkacruisecontrol/load,_queryParams={allow_capacity_estimation=[true], json=[true]}} (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) [2023-05-23 07:53:44,016] WARN Received exception when trying to execute runnable for "Get broker stats" (com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable) com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:120) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModelFromEarliest(LoadRunnable.java:93) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:76) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.run(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] Caused by: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.aggregate(MetricSampleAggregator.java:202) ~[cruise-control-core-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.monitor.sampling.aggregator.KafkaPartitionMetricSampleAggregator.aggregate(KafkaPartitionMetricSampleAggregator.java:151) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.monitor.LoadMonitor.clusterModel(LoadMonitor.java:496) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.clusterModel(KafkaCruiseControl.java:370) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:111) ~[cruise-control-2.5.122.jar:?] ... 10 more [2023-05-23 07:53:44,022] ERROR Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0).'. (com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler) java.util.concurrent.ExecutionException: com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) ~[?:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.AbstractAsyncRequest.getResponse(AbstractAsyncRequest.java:56) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.AbstractRequest.handle(AbstractRequest.java:37) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler.handleGet(KafkaCruiseControlRequestHandler.java:111) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler.doGetOrPost(KafkaCruiseControlRequestHandler.java:71) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.ServletRequestHandler.doGet(ServletRequestHandler.java:41) ~[cruise-control-2.5.122.jar:?] at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) ~[javax.servlet-api-3.1.0.jar:3.1.0] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[jetty-servlet-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554) ~[jetty-servlet-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[jetty-server-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[jetty-util-9.4.47.v20220610.jar:9.4.47.v20220610] at java.lang.Thread.run(Thread.java:829) ~[?:?] Caused by: com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:120) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModelFromEarliest(LoadRunnable.java:93) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:76) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.run(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more Caused by: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.aggregate(MetricSampleAggregator.java:202) ~[cruise-control-core-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.monitor.sampling.aggregator.KafkaPartitionMetricSampleAggregator.aggregate(KafkaPartitionMetricSampleAggregator.java:151) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.monitor.LoadMonitor.clusterModel(LoadMonitor.java:496) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.clusterModel(KafkaCruiseControl.java:370) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:111) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModelFromEarliest(LoadRunnable.java:93) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:76) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.122.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.run(LoadRunnable.java:26) ~[cruise-control-2.5.122.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more [2023-05-23 07:53:44,030] INFO 10.99.119.36 - - [23/May/2023:10:53:43 +0000] "GET /kafkacruisecontrol/load?allow_capacity_estimation=true&json=true HTTP/1.1" 500 6158 (CruiseControlPublicAccessLogger) [2023-05-23 07:53:45,778] INFO Processing sync request KafkaClusterStateRequest. (com.linkedin.kafka.cruisecontrol.servlet.handler.async.AbstractAsyncRequest) [2023-05-23 07:53:45,940] INFO 10.99.119.36 - - [23/May/2023:10:53:45 +0000] "GET /kafkacruisecontrol/kafka_cluster_state?json=true HTTP/1.1" 200 839 (CruiseControlPublicAccessLogger) [2023-05-23 07:53:48,709] WARN UserTask 5a9abba2-f714-44c8-8117-ad708c9604f1 is completed with Exception and removed from active tasks list (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) [2023-05-23 07:53:48,709] INFO UserTask fa21dab3-7c46-4bf8-9253-0d304ed8c348 is completed and removed from active tasks list (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationFuture (file:/root/cruise-control-2.5.122/cruise-control/build/libs/cruise-control-2.5.122.jar) to field java.lang.Throwable.detailMessage WARNING: Please consider reporting this to the maintainers of com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationFuture WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release [2023-05-23 07:53:48,711] INFO Task [5a9abba2-f714-44c8-8117-ad708c9604f1] calculation fails, exception: java.util.concurrent.ExecutionException: Operation 'Get broker stats' received exception. com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1684839223990] (index [1, -1]). Window index (current: 0, oldest: 0). (operationLogger)

Lastly this is my confi file

#Config

The metric sampler class

#metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler

Configuration for the metadata client.

=======================================

The Kafka cluster to control.

bootstrap.servers=b-1.damskma.4e3z1v.c2.kafka.sa-east-1.amazonaws.com:9094,b-3.damkma.4e3z1v.c2.kafka.sa-east-1.amazonaws.com:9094,b-2.damkamskma.4e3z1v.c2.kafka.sa-east-1.amazonaws.com:9094

SSL properties, needed if cluster is using TLS encryption

security.protocol=SSL ssl.truststore.location=/root/kafka_2.13-2.8.2/kafka.client.trustore.jks ssl.keystore.password=XXX

SSL properties, Keystore

ssl.keystore.location=/root/kafka_2.13-2.8.2/kafka.client.keystore.jks ssl.keystore.password=XXX

The metric sampler class

#metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler

Prometheus Metric Sampler specific configuration

prometheus.server.endpoint=<PROMETHEUS_URL>:<PORT>

True if the sampling process allows CPU capacity estimation of brokers used for CPU utilization estimation.

sampling.allow.cpu.capacity.estimation=true

Configurations for CruiseControlMetricsReporterSampler

metric.reporter.topic=__CruiseControlMetrics

The sample store class name

sample.store.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore

The config for the Kafka sample store to save the partition metric samples

partition.metric.sample.store.topic=__KafkaCruiseControlPartitionMetricSamples

The config for the Kafka sample store to save the model training samples

broker.metric.sample.store.topic=__KafkaCruiseControlModelTrainingSamples

mantoanipythian avatar May 23 '23 11:05 mantoanipythian

Couple of things come to mind with CC and MSK. When using tiered storage with MSK -- which version of AdminClient ships with CC 2.5.122? re: topic id mismatch as described https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#topic-id-mismatch

But, for metrics, CC, and MSK regardless of tiered storage or not, I'm wondering if you configured CC according to https://catalog.workshops.aws/msk-labs/en-US/cruisecontrol and built CC with PrometheusMetricSampler as shown in step 7 here https://catalog.workshops.aws/msk-labs/en-US/cruisecontrol/installcruisecontrol? Hope this helps

tmcgrath avatar Jun 19 '23 21:06 tmcgrath