kafka
kafka copied to clipboard
[KAFKA-13369] Follower fetch protocol changes for tiered storage.
This PR implements the follower fetch protocol as mentioned in KIP-405.
Added a new version for ListOffsets
protocol to receive local log start offset on the leader replica. This is used by follower replicas to find the local log star offset on the leader.
Added a new version for FetchRequest
protocol to receive OffsetMovedToTieredStorageException error. This is part of the enhanced fetch protocol as described in KIP-405.
We introduced a new field locaLogStartOffset
to maintain the log start offset in the local logs. Existing logStartOffset will continue to be the log start offset of the effective log that includes the segments in remote storage.
When a follower receives OffsetMovedToTieredStorage, then it tries to build the required state from the leader and remote storage so that it can be ready to move to fetch state.
Introduced RemoteLogManager
which is responsible for
- initializing
RemoteStorageManager
andRemoteLogMetadataManager
instances. - receives any leader and follower replica events and partition stop events and act on them
- also provides APIs to fetch indexes, metadata about remote log segments.
Followup PRs will add more functionality like copying segments to tiered storage, retention checks to clean local and remote log segments. This will change the local log start offset and make sure the follower fetch protocol works fine for several cases.
You can look at the detailed protocol changes in KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-FollowerReplication
Authors: [email protected], [email protected], [email protected]
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
@satishd It'd be helpful if you could please update the PR description explaining the scope of the draft PR (in its current form) and what's remaining to be done.
@junrao @kowshik This PR is ready for review. Please take a look.
Thanks @junrao for the review. Please find inline replies, addressed most of them with latest commits.
@junrao : Thanks for the review. Please find inline replies, updated the PR addressing them with the latest commit.
@junrao Thanks for the review. Please find inline replies, updated the PR addressing them with the latest commits.
@junrao : Thanks for the review. Addressed your comments with the latest commits.
Rebased with the trunk to resolve the conflicts.
Thanks @showuon for your latest comments. Addressed them inline, let me know if you have any comments.
Thanks @divijvaidya for your comments, addressed them with inline comments.
Thanks @junrao for your comments. Addressed most of them, working on couple of the remaining comments to update the javadoc.
@junrao @kowshik @ccding @divijvaidya , do you want to have another review? Since branch 3.4 has created, and this PR blocks some following tiered storage feature development (ex: copying segments to tiered storage, retention checks to clean local and remote log segments), we might need to consider to merge it first and have follow-up PRs for complicated changes. WDYT? Thanks.
@showuon : I plan to take another look at the PR in the next few days.
Thanks @junrao for your review, addressed them with inline replies. I updated the PR with latest commits addressing the review comments.
@junrao : Filed https://issues.apache.org/jira/browse/KAFKA-14467 as you suggested.
Thanks @junrao for your updated review. Addressed them with inline comments and updated with the latest commits.
Thanks @junrao for the review, addressed it with the latest commit.
There are a few tests that are failed but they do not seem to be related to this PR.
Hey @satishd, I wanted to let you know about KAFKA-14470 as I think it affects some of the future KIP-405 PRs. Can we align these efforts so that we can get to the desired end state faster? For example, once the PRs that have been submitted are merged, we can move RemoteIndexCache
to the storage
module too.