cruise-control icon indicating copy to clipboard operation
cruise-control copied to clipboard

Add broker throw IllegalStateException

Open qq359130530 opened this issue 1 year ago • 1 comments

I deployed a Kafka cluster, then deployed CC, managed the Kafka cluster, then incrementally deployed a Kafka node, then added the new Broker to the cluster using the CC add_broker interface, which throws an IllegalStateException. The stack information is as follows:

[2024-11-06 15:01:45,959] INFO UserTask 66dd8585-4ffd-4395-8c1b-482f411afc52 is completed and removed from active tasks list (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) [2024-11-06 15:01:45,961] INFO Task [66dd8585-4ffd-4395-8c1b-482f411afc52] calculation finishes, result: {"KafkaPartitionState":{"offline":[],"urp":[],"with-offline-replicas":[],"under-min-isr":[]},"KafkaBrokerState":{"OfflineReplicaCountByBrokerId":{"2":0},"IsController":{"1":false,"2":true},"Summary":{"StdLeadersPerBroker":91.0,"Leaders":182,"MaxLeadersPerBroker":182,"Topics":18,"MaxReplicasPerBroker":182,"StdReplicasPerBroker":91.0,"Brokers":2,"AvgReplicationFactor":1.0,"AvgLeadersPerBroker":91.0,"Replicas":182,"AvgReplicasPerBroker":91.0},"OutOfSyncCountByBrokerId":{"2":0},"LeaderCountByBrokerId":{"1":182,"2":0},"OnlineLogDirsByBrokerId":{"1":["/home/sdb/kafka/kafka_data_dir","/home/sdc/kafka/kafka_data_dir","/home/sdd/kafka/kafka_data_dir"],"2":["/home/sdb/kafka/kafka_data_dir","/home/sdc/kafka/kafka_data_dir","/home/sdd/kafka/kafka_data_dir"]},"BrokerSetByBrokerId":{},"OfflineLogDirsByBrokerId":{"1":[],"2":[]},"ReplicaCountByBrokerId":{"1":182,"2":0}},"version":1} (operationLogger) [2024-11-06 15:01:50,842] INFO Processing async request AddBrokerRequest. (com.linkedin.kafka.cruisecontrol.servlet.handler.async.AbstractAsyncRequest) [2024-11-06 15:01:50,845] INFO Create a new UserTask 50387b5f-d82a-41a5-9bd0-f659bdaaa3b6 with SessionKey SessionKey{_session=com.linkedin.kafka.cruisecontrol.servlet.ServletSession@532bb10c,_requestUrl=POST /kafkacruisecontrol/add_broker,_queryParams={brokerid=[2], dryrun=[false], json=[true]}} (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) [2024-11-06 15:01:50,855] WARN Received exception when trying to execute runnable for "Add brokers" (com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable) java.lang.IllegalStateException: Cannot execute new proposals due to failure to retrieve whether the Kafka cluster has an already ongoing partition reassignment. at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.sanityCheckDryRun(KafkaCruiseControl.java:288) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.init(GoalBasedOperationRunnable.java:132) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:157) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:91) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:35) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.run(GoalBasedOperationRunnable.java:38) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_PARTITION_REASSIGNMENTS at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) ~[?:?] at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180) ~[kafka-clients-3.5.1.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ExecutionUtils.ongoingPartitionReassignments(ExecutionUtils.java:372) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ExecutionUtils.partitionsBeingReassigned(ExecutionUtils.java:350) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.executor.Executor.listPartitionsBeingReassigned(Executor.java:1245) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.sanityCheckDryRun(KafkaCruiseControl.java:285) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] ... 9 more Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_PARTITION_REASSIGNMENTS [2024-11-06 15:01:50,861] ERROR Error processing POST request '/add_broker' due to: 'java.lang.IllegalStateException: Cannot execute new proposals due to failure to retrieve whether the Kafka cluster has an already ongoing partition reassignment.'. (com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler) java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Cannot execute new proposals due to failure to retrieve whether the Kafka cluster has an already ongoing partition reassignment. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) ~[?:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.AbstractAsyncRequest.getResponse(AbstractAsyncRequest.java:56) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.AbstractRequest.handle(AbstractRequest.java:37) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler.handlePost(KafkaCruiseControlRequestHandler.java:156) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlRequestHandler.doGetOrPost(KafkaCruiseControlRequestHandler.java:74) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.ServletRequestHandler.doPost(ServletRequestHandler.java:46) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) ~[javax.servlet-api-3.1.0.jar:3.1.0] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009] at java.lang.Thread.run(Thread.java:834) ~[?:?] Caused by: java.lang.IllegalStateException: Cannot execute new proposals due to failure to retrieve whether the Kafka cluster has an already ongoing partition reassignment. at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.sanityCheckDryRun(KafkaCruiseControl.java:288) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.init(GoalBasedOperationRunnable.java:132) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:157) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:91) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:35) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.run(GoalBasedOperationRunnable.java:38) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_PARTITION_REASSIGNMENTS at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) ~[?:?] at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180) ~[kafka-clients-3.5.1.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ExecutionUtils.ongoingPartitionReassignments(ExecutionUtils.java:372) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.executor.ExecutionUtils.partitionsBeingReassigned(ExecutionUtils.java:350) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.executor.Executor.listPartitionsBeingReassigned(Executor.java:1245) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.sanityCheckDryRun(KafkaCruiseControl.java:285) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.init(GoalBasedOperationRunnable.java:132) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:157) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:91) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.AddBrokersRunnable.getResult(AddBrokersRunnable.java:35) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.run(GoalBasedOperationRunnable.java:38) ~[cruise-control-2.5.142-SNAPSHOT.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: The broker does not support LIST_PARTITION_REASSIGNMENTS [2024-11-06 15:01:50,863] INFO 10.67.0.97 - - [06/Nov/2024:07:01:50 +0000] "POST /kafkacruisecontrol/add_broker?dryrun=false&brokerid=2&json=true HTTP/1.1" 500 6139 (CruiseControlPublicAccessLogger) [2024-11-06 15:01:50,959] WARN UserTask 50387b5f-d82a-41a5-9bd0-f659bdaaa3b6 is completed with Exception and removed from active tasks list (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationFuture (file:/home/master/cruise-control/cruise-control/build/libs/cruise-control-2.5.142-SNAPSHOT.jar) to field java.lang.Throwable.detailMessage WARNING: Please consider reporting this to the maintainers of com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationFuture WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release [2024-11-06 15:01:50,963] INFO Task [50387b5f-d82a-41a5-9bd0-f659bdaaa3b6] calculation fails, exception: java.util.concurrent.ExecutionException: Operation 'Add brokers' received exception. java.lang.IllegalStateException: Cannot execute new proposals due to failure to retrieve whether the Kafka cluster has an already ongoing partition reassignment. (operationLogger) [2024-11-06 15:02:45,959] INFO Expiring the session associated with SessionKey{_session=com.linkedin.kafka.cruisecontrol.servlet.ServletSession@532bb10c,_requestUrl=POST /kafkacruisecontrol/add_broker,_queryParams={brokerid=[2], dryrun=[false], json=[true]}}. (com.linkedin.kafka.cruisecontrol.servlet.UserTaskManager) [2024-11-06 15:03:05,976] INFO Kicking off metric sampling for time range [1730876345976, 1730876585976], duration 240000 ms with timeout 120000 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager) [2024-11-06 15:03:06,001] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-11 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,002] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-9 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,002] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-7 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,002] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-5 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,002] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-3 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,002] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-1 (org.apache.kafka.clients.consumer.KafkaConsumer) [2024-11-06 15:03:06,003] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--914373556761640071, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-30 (org.apache.kafka.clients.consumer.KafkaConsumer)

qq359130530 avatar Nov 06 '24 07:11 qq359130530

Hi, I’d like to work on this issue .

From the stack trace, Cruise Control fails when calling ListPartitionReassignments on older Kafka brokers that don’t support this API, which throws UnsupportedVersionException. This is then wrapped in IllegalStateException in KafkaCruiseControl.sanityCheckDryRun, and add_broker fails.

I’m thinking of updating ExecutionUtils.ongoingPartitionReassignments / sanityCheckDryRun to handle UnsupportedVersionException gracefully (e.g., treat it as “no ongoing reassignments” or log a warning but still allow the operation).

Would this behavior be acceptable, or do you prefer a different fallback (for example: skip the check for older clusters but include a warning in the response)?

Gautam-aman avatar Dec 05 '25 19:12 Gautam-aman