rocketmq icon indicating copy to clipboard operation
rocketmq copied to clipboard

Scan brokerInActive if any error occurs.

Open echooymxq opened this issue 3 years ago • 7 comments
trafficstars

The current change BrokerRole implementation is detects the broker's heartbeat periodically. https://github.com/apache/rocketmq/blob/10e8e6b8ae565095cfa6c70fa07857e526a20d0d/controller/src/main/java/org/apache/rocketmq/controller/impl/DefaultBrokerHeartbeatManager.java#L77-L86

The above code, iterator.remove() remove first and with async notify mechanism to elect a master, if there are any failure or timeout(eg: write to DLedger error), no retry mechanism, The broker always have no leader?

echooymxq avatar Sep 07 '22 01:09 echooymxq

cc @RongtongJin @hzh0425

echooymxq avatar Sep 07 '22 01:09 echooymxq

cc @RongtongJin @hzh0425

@echooymxq Good catch! If the disconnected broker is not successfully removed from the metadata, there is indeed no retry mechanism. IMO, we can set a scheduled task to check and trigger the elect master if the conditions are met. Are you interested in submitting a relevant PR?

RongtongJin avatar Sep 07 '22 06:09 RongtongJin

@RongtongJin I don't tend to use this approach of detects the broker's heartbeat periodically, i prefer use a schedule task to scan the broker health, heartbeat just means the broker active status. i will submit a PR and try to resolve it.

echooymxq avatar Sep 07 '22 07:09 echooymxq

Hi @echooymxq
How do you define whether broker is healthy. What are the indicators? Can you share your idea.

mxsm avatar Sep 07 '22 15:09 mxsm

Looks scheduled task is a good way to solve this problem. Other events may also be lost if execute failure.

ni-ze avatar Sep 08 '22 03:09 ni-ze

@mxsm @ni-ze I have commit a pr, welcome to review and discussion.

echooymxq avatar Sep 13 '22 03:09 echooymxq

@echooymxq It is better to submit a pr for review.

ni-ze avatar Sep 15 '22 02:09 ni-ze

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

github-actions[bot] avatar Sep 16 '23 00:09 github-actions[bot]

This issue was closed because it has been inactive for 3 days since being marked as stale.

github-actions[bot] avatar Sep 19 '23 00:09 github-actions[bot]