kafka-monitor icon indicating copy to clipboard operation
kafka-monitor copied to clipboard

Ensure proper retry and backoff for newly created monitor topics

Open gitlw opened this issue 2 years ago • 1 comments

As shown in the conversations https://linkedin-randd.slack.com/archives/C04FMP0HB17/p1671222219329569, if a new monitoring topic is just created in a cluster, the AdminClient.describeTopic API could result in UnknownTopicOrPartitionExceptions, which causes the whole process to crash. Below are the places that can trigger the exception (and there maybe more call sites)

https://github.com/linkedin/kafka-monitor/blob/7f99c095c2ceb2d09b0e490fa138a68fac849bba/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java#L455

https://github.com/linkedin/kafka-monitor/blob/7f99c095c2ceb2d09b0e490fa138a68fac849bba/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java#L338

We need to make sure that the logic calling the describeTopic API has appropriate retries and backoffs in case it's a topic that's just created.

gitlw avatar Dec 19 '22 19:12 gitlw

This is your first issue in the repository. Thank you for raising this issue.' first issue

github-actions[bot] avatar Dec 19 '22 19:12 github-actions[bot]