cruise-control
cruise-control copied to clipboard
KafkaAdminTopicConfigProvider.configure() should report original exception(s) raised by describeClusterConfigs
I am using Cruise Control 2.5.99 with AWS MSK. I reached the point where Curise Control is able to connect to the Kafka brokers, but terminates with the following exception:
[2023-12-07 11:49:14,062] ERROR Uncaught exception on thread Thread[main,5,main] (com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain)
java.lang.RuntimeException: Failed to describe Kafka cluster configs.
at com.linkedin.kafka.cruisecontrol.config.KafkaAdminTopicConfigProvider.configure(KafkaAdminTopicConfigProvider.java:174) ~[cruise-control-2.5.99.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfigUtils.getConfiguredInstance(KafkaCruiseControlConfigUtils.java:49) ~[cruise-control-2.5.99.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.getConfiguredInstance(KafkaCruiseControlConfig.java:98) ~[cruise-control-2.5.99.jar:?]
[...]
As the stacktrace indicates, that error message comes from: https://github.com/linkedin/cruise-control/blob/9ccbb9eeb497b23d9e98e76f7512abea908366af/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/config/KafkaAdminTopicConfigProvider.java#L172
public void configure(Map<String, ?> configs) {
_adminClient = (AdminClient) validateNotNull(
configs.get(LoadMonitor.KAFKA_ADMIN_CLIENT_OBJECT_CONFIG),
() -> String.format("Missing %s when creating Kafka Admin Client based Topic Config Provider",
LoadMonitor.KAFKA_ADMIN_CLIENT_OBJECT_CONFIG));
Config clusterConfigs;
try {
clusterConfigs = describeClusterConfigs(_adminClient, DESCRIBE_CLUSTER_CONFIGS_TIMEOUT);
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException("Failed to describe Kafka cluster configs.");
}
[...]
The catch statement simply swallows any ExecutionException and replaces any useful error with the generic message "Failed to describe Kafka cluster configs."
It would be useful if it instead logged the original exception or included it in the generic RunTimeException message.
In my setup Cruise Control is running as a pod on an EKS cluster which resides in the same AWS Account as the MSK cluster.
I double checked the Pod Service Account -> IAM Role mapping and verified that the IAM Role has a Policy which allows all of the Kafka operations on the relevant MSK cluster.
Happy to provide more details about this if deemed useful.