kafka-monitor
kafka-monitor copied to clipboard
Ensure proper retry and backoff for newly created monitor topics
As shown in the conversations https://linkedin-randd.slack.com/archives/C04FMP0HB17/p1671222219329569, if a new monitoring topic is just created in a cluster, the AdminClient.describeTopic API could result in UnknownTopicOrPartitionExceptions, which causes the whole process to crash. Below are the places that can trigger the exception (and there maybe more call sites)
https://github.com/linkedin/kafka-monitor/blob/7f99c095c2ceb2d09b0e490fa138a68fac849bba/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java#L455
https://github.com/linkedin/kafka-monitor/blob/7f99c095c2ceb2d09b0e490fa138a68fac849bba/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java#L338
We need to make sure that the logic calling the describeTopic API has appropriate retries and backoffs in case it's a topic that's just created.
This is your first issue in the repository. Thank you for raising this issue.' first issue