rocketmq
rocketmq copied to clipboard
Scan brokerInActive if any error occurs.
The current change BrokerRole implementation is detects the broker's heartbeat periodically. https://github.com/apache/rocketmq/blob/10e8e6b8ae565095cfa6c70fa07857e526a20d0d/controller/src/main/java/org/apache/rocketmq/controller/impl/DefaultBrokerHeartbeatManager.java#L77-L86
The above code, iterator.remove() remove first and with async notify mechanism to elect a master, if there are any failure or timeout(eg: write to DLedger error), no retry mechanism, The broker always have no leader?
cc @RongtongJin @hzh0425
cc @RongtongJin @hzh0425
@echooymxq Good catch! If the disconnected broker is not successfully removed from the metadata, there is indeed no retry mechanism. IMO, we can set a scheduled task to check and trigger the elect master if the conditions are met. Are you interested in submitting a relevant PR?
@RongtongJin I don't tend to use this approach of detects the broker's heartbeat periodically, i prefer use a schedule task to scan the broker health, heartbeat just means the broker active status. i will submit a PR and try to resolve it.
Hi @echooymxq
How do you define whether broker is healthy. What are the indicators? Can you share your idea.
Looks scheduled task is a good way to solve this problem. Other events may also be lost if execute failure.
@mxsm @ni-ze I have commit a pr, welcome to review and discussion.
@echooymxq It is better to submit a pr for review.
This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.
This issue was closed because it has been inactive for 3 days since being marked as stale.