KAFKA-19893: Reduce tiered storage redundancy with delayed upload (Topic-level feature) (KIP-1241)
Currently, Kafka uploads all non-active local log segments to remote storage even when they are still within the local retention period, resulting in redundant storage of the same data in both tiers. This wastes storage capacity (cost) without providing immediate benefits,since reads during the retention window prioritize local data.
However, some users/topics rely on remote storage for real-time analytics and need the latest data to be available as soon as possible (In fact, it only tries to stay as up-to-date as possible, but it still can’t include the latest data because the active segment hasn’t been uploaded yet.). Therefore, this optimization is offered as a topic's optional configuration rather than the default behavior.
Here are some additional thoughts/considerations.
- Local files won’t be deleted until they’ve been uploaded to the remote storage, so this change is very safe—you don’t need to worry about files being cleaned up before they be upload to the remote.
- Considering the latency of remote storage, the local retention period won’t be set too short. For example, in our production environment, we keep 1 day of local data alongside 3-7 days in remote storage, so there’s still 1 day of redundancy.
Example for the goal:
Attach test result: [Precondition] Create one topic enable remote stroage in Kafka (3 brokers + 3 controller)
local storage time: 20 minutes
remote stroage time: 40 minutes
partition: 3
segement.bytes: 10M
[Steps]
- Deploy this code patch into one broker only and restart the broker
- Keep sending a lot of messages to the topic
- Check the disk sizes on both local and remote storage at two points in time: 20 minutes before and 1 hour after.
[Result]
Before 20 minutes:
- only 2 partition upload the local to remote.
After 1 hour:
- The remote storage size for one partition (on the broker with the code change) is much smaller than the other two.
- The sizes of the local disks are similar.
A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.
Hi, @kamalcph Sorry to bother you. I know you’ve been deeply involved in the remote storage area, and I was wondering if you might be interested — when you have some free time — in taking a look at this cost-saving topic and providing some guidance. Thank you very much!
A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.
cc @kamalcph here due to community's email don't allow to attach the image. We can discuss the content in email about the KIP. Thanks
@jiafu1115
The already uploaded segments are eligible for deletion from broker. So, when remote storage is down, then those segments can be deleted as per the local retention settings and new segments can occupy those space. This provides more time for the Admin to act when remote storage is down for a longer time.
@kamalcph I think I understand what you mean now. I’ve updated the picture above. Could you help double-check whether we’ve reached the same understanding? The drawback of this KIP is that, during a long time remote storage outage. it will occupied more disk so that admin may need one extra disk expansion. The max value is the redundant part we saving. Thus. After the outage recovered. It will come back to the beginning. Right?
A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.
A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.