cruise-control
cruise-control copied to clipboard
Handle null pointer exception in isPartitionUnderReplicated check
During the execution of isPartitionUnderReplicated check, it is possible that the partition under check gets deleted. In this case, currently the operation fails with null pointer exception.
https://github.com/linkedin/cruise-control/blob/f4ca900e58944478b539955a3d70fcd802c0e1a8/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/KafkaCruiseControlUtils.java#L786
One such exception trace when executing a demote operation:
2023/11/07 06:40:42.675 WARN [OperationRunnable] [ServletSessionExecutor-1] [kafka-cruise-control] [] Received exception when trying to execute runnable for "Demote"
com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: java.lang.NullPointerException
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:167) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.DemoteBrokerRunnable.getResult(DemoteBrokerRunnable.java:115) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.DemoteBrokerRunnable.getResult(DemoteBrokerRunnable.java:57) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) [cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.run(GoalBasedOperationRunnable.java:36) [cruise-control-2.5.129.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.NullPointerException
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlUtils.isPartitionUnderReplicated(KafkaCruiseControlUtils.java:785) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal.maybeMoveReplicaToEndOfReplicaList(PreferredLeaderElectionGoal.java:69) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal.optimize(PreferredLeaderElectionGoal.java:95) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.analyzer.GoalOptimizer.optimizations(GoalOptimizer.java:467) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.optimizations(KafkaCruiseControl.java:605) ~[cruise-control-2.5.129.jar:?]
at com.linkedin.kafka.licruisecontrol.servlet.handler.async.runnable.LiDemoteBrokerRunnable.workWithClusterModel(LiDemoteBrokerRunnable.java:77) ~[likafka-cruise-control-impl_2.12-3.2.88.jar:?]
at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.GoalBasedOperationRunnable.computeResult(GoalBasedOperationRunnable.java:161) ~[cruise-control-2.5.129.jar:?]
... 9 more
In such cases, the expectation is to do a null check before access and handle the partition not found case gracefully.