tikv
tikv copied to clipboard
raftstore: remove stale ranges by DeleteByKeys rather than ingesting. (#18040)
This is an automated cherry-pick of #18040
What is changed and how it works?
Issue Number: Close #18107, Ref https://github.com/tikv/tikv/issues/18042
This PR mainly contains the following parts for optimization on scaling, used to mitigate the impacts, introduced by unnecessary ingesting sst files:
- Directly clearing stale ranges by DeleteByKeys during the balancing regions process.
- Do not clear data of offline stores during the scale-in process, as this data will be automatically cleared when the corresponding node goes offline.
What's Changed:
Related changes
- [ ] PR to update
pingcap/docs/pingcap/docs-cn: - [ ] Need to cherry-pick to the release branch
Check List
Tests
- [x] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code
Taking the grpc messages duration in TiKV metrics panel as examples, positive performance feedbacks on reducing the long-tail latency can be reviewed from the following comparison results:
| Commit | Comparisons on grpc messages duration |
|---|---|
| master | |
| This PR |
Meanwhile, the following E2E long-tail reduction also proves that this PR makes positive improvements effects:
| Commit | Scale-Out (Before -> Scaling) | Scale-in (Before -> Scaling) |
|---|---|---|
| master | P99: 7.2ms -> ~ 11.3 ms P999: 62ms -> ~89.2 ms | P99: 7.2ms -> ~ 16.1 ms P999: 62ms -> ~100 ms |
| This PR | P99: 7.2 ms -> ~10.6 ms P999: 62 ms -> ~76.9ms | P99: 7.2ms -> ~ 15.3 ms P999: 62ms -> ~96 ms |
Side effects
- [ ] Performance regression: Consumes more CPU
- [ ] Performance regression: Consumes more Memory
- [ ] Breaking backward compatibility
Release note
Optimizing the processing of clearing stale-ranges by DeleteByKeys to mitigate the impacts
on latency.
@hhwyt This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.
@hhwyt: adding LGTM is restricted to approvers and reviewers in OWNERS files.
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: hbisheng, hhwyt, LykxSassinator
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [LykxSassinator]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
[LGTM Timeline notifier]
Timeline:
2025-07-09 09:31:29.947576307 +0000 UTC m=+2078542.670755300: :ballot_box_with_check: agreed by LykxSassinator.2025-07-09 09:37:47.325224261 +0000 UTC m=+2078920.048403242: :ballot_box_with_check: agreed by hbisheng.
/unhold