centraldogma
centraldogma copied to clipboard
Improve the robustness of old ZooKeeper log removal
Motivation:
OldLogRemover in ZooKeeperCommandExecutor currently catches a Throwable when deleting an old log or its log blocks. However, it has two issues doing so:
- It doesn't handle an exception that's raised when reading the metadata of the old log.
-
Throwableis way too wide exception to catch. Catching aKeeperExceptionwhose code isNONODEwill be enough.- Note that the failure will only transfer the leadership to other replica, rather than stopping the whole replication process.
Modifications:
-
OldLogRemovernow catchesKeeperExceptionwhose code isNONODEonly. - An attempt to read a missing log node's metadata is now handled properly.
- Added more detail to the log messages about missing nodes
- Split
deleteLog()intodeleteLog()anddeleteLogBlock()
- Split
Result:
- The leadership is not transferred anymore when
OldLogRemoverattempts to retrieve a missing log node's metadata, which is not really a critical issue.- Instead, the leadership will be transferred when an exception occurs not because of a missing node.
Throwable is way too wide exception to catch. Catching a KeeperException whose code is NONODE will be enough. Note that the failure will only transfer the leadership to other replica, rather than stopping the whole replication process.
Question: Is there any chance that the replica, which receives the leadership, raises the same exception?