milvus icon indicating copy to clipboard operation
milvus copied to clipboard

enhance: improve WAL retention strategy

Open tinswzy opened this issue 1 month ago • 35 comments

issue: #44369 woodpecker related issue: #59

Refactor the WAL retention logic in Milvus StreamingNode:

  • Remove the simple sampling-based truncation mechanism.
  • After flush, WAL data is directly truncated.
  • The retention control is now delegated to the underlying message queue (MQ) implementation.

tinswzy avatar Nov 06 '25 07:11 tinswzy

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tinswzy To complete the pull request process, please assign zhengbuqian after the PR has been reviewed. You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar Nov 06 '25 07:11 sre-ci-robot

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 06 '25 07:11 mergify[bot]

[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm

If you have any questions or requests, please contact @zhikunyao.

sre-ci-robot avatar Nov 06 '25 08:11 sre-ci-robot

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 06 '25 09:11 mergify[bot]

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 06 '25 09:11 mergify[bot]

Codecov Report

:x: Patch coverage is 76.19048% with 5 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 76.48%. Comparing base (96d0e78) to head (9bc7981). :warning: Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
pkg/streaming/walimpls/impls/pulsar/opener.go 33.33% 1 Missing and 1 partial :warning:
pkg/streaming/walimpls/impls/wp/builder.go 0.00% 2 Missing :warning:
pkg/streaming/walimpls/impls/pulsar/wal.go 75.00% 1 Missing :warning:

:x: Your project check has failed because the head coverage (76.48%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #45350      +/-   ##
==========================================
- Coverage   76.55%   76.48%   -0.08%     
==========================================
  Files        1875     1874       -1     
  Lines      291983   291833     -150     
==========================================
- Hits       223531   223197     -334     
- Misses      61073    61224     +151     
- Partials     7379     7412      +33     
Components Coverage Δ
Client 78.17% <ø> (ø)
Core 83.18% <ø> (ø)
Go 74.58% <76.19%> (-0.12%) :arrow_down:
Files with missing lines Coverage Δ
...ternal/streamingnode/server/wal/recovery/config.go 91.66% <ø> (+14.92%) :arrow_up:
...ernal/streamingnode/server/wal/recovery/metrics.go 100.00% <ø> (ø)
...de/server/wal/recovery/recovery_background_task.go 93.05% <ø> (-3.08%) :arrow_down:
...gnode/server/wal/recovery/recovery_storage_impl.go 83.42% <100.00%> (-3.98%) :arrow_down:
pkg/metrics/streaming_service_metrics.go 100.00% <ø> (ø)
pkg/util/paramtable/component_param.go 97.60% <ø> (-0.02%) :arrow_down:
pkg/util/paramtable/service_param.go 97.72% <100.00%> (+0.01%) :arrow_up:
pkg/streaming/walimpls/impls/pulsar/wal.go 50.42% <75.00%> (-0.82%) :arrow_down:
pkg/streaming/walimpls/impls/pulsar/opener.go 75.00% <33.33%> (+20.71%) :arrow_up:
pkg/streaming/walimpls/impls/wp/builder.go 5.10% <0.00%> (-0.08%) :arrow_down:

... and 36 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 06 '25 11:11 codecov[bot]

/run-cpu-e2e

tinswzy avatar Nov 07 '25 01:11 tinswzy

rerun go-sdk

tinswzy avatar Nov 07 '25 01:11 tinswzy

/ci-rerun-ut

tinswzy avatar Nov 07 '25 01:11 tinswzy

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 07 '25 02:11 mergify[bot]

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 07 '25 02:11 mergify[bot]

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 12 '25 02:11 mergify[bot]

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 12 '25 02:11 mergify[bot]

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 12 '25 04:11 mergify[bot]

/run-cpu-e2e

tinswzy avatar Nov 12 '25 06:11 tinswzy

/ci-rerun-ut

tinswzy avatar Nov 12 '25 06:11 tinswzy

/ci-rerun-code-check

tinswzy avatar Nov 12 '25 06:11 tinswzy

/ci-rerun-ut-go

tinswzy avatar Nov 12 '25 06:11 tinswzy

/ci-rerun-ut-integration

tinswzy avatar Nov 12 '25 06:11 tinswzy

/ci-rerun-ut-go

tinswzy avatar Nov 12 '25 09:11 tinswzy

/ci-rerun-ut-cpp

tinswzy avatar Nov 12 '25 09:11 tinswzy

/ci-rerun-code-check

tinswzy avatar Nov 12 '25 09:11 tinswzy

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 12 '25 11:11 mergify[bot]

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 12 '25 11:11 mergify[bot]

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 12 '25 13:11 mergify[bot]

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 12 '25 13:11 mergify[bot]

/ci-rerun-ut-go

tinswzy avatar Nov 14 '25 10:11 tinswzy

@tinswzy go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Nov 17 '25 10:11 mergify[bot]

@tinswzy cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 17 '25 11:11 mergify[bot]

/lgtm

chyezh avatar Nov 17 '25 11:11 chyezh