milvus
milvus copied to clipboard
fix: resolve SessionWatcher goroutine leak and unstable UT in querycoordv2
Related to #44620 Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"
Introduce SessionWatcher interface to fix race condition and goroutine leak that caused unstable unit test TestServer/TestNodeUp.
Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing
Fixes race condition between session context cancellation and internal loop exit. Eliminates goroutine leak by providing explicit lifecycle management.
[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.
- Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
- Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
- For tests that exist in both v1 and v2, passing in either system is considered PASS.
To rerun ci-v2 checks, comment with:
- /ci-rerun-code-check // for ci-v2/code-check
- /ci-rerun-build // for ci-v2/build
- /ci-rerun-ut-integration // for ci-v2/ut-integration
- /ci-rerun-ut-go // for ci-v2/ut-go
- /ci-rerun-ut-cpp // for ci-v2/ut-cpp
- /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
- /ci-rerun-e2e-arm // for ci-v2/e2e-arm
If you have any questions or requests, please contact @zhikunyao.
/ci-rerun-ut-go
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
/run-cpu-e2e
Codecov Report
:x: Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 76.50%. Comparing base (caed0fe) to head (f6a9728).
:warning: Report is 31 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| cmd/tools/migration/migration/runner.go | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #45627 +/- ##
===========================================
- Coverage 83.18% 76.50% -6.68%
===========================================
Files 521 1875 +1354
Lines 81313 292178 +210865
===========================================
+ Hits 67642 223539 +155897
- Misses 13671 61240 +47569
- Partials 0 7399 +7399
| Components | Coverage Δ | |
|---|---|---|
| Client | 78.17% <ø> (∅) |
|
| Core | 83.19% <98.38%> (+0.01%) |
:arrow_up: |
| Go | 74.62% <95.52%> (∅) |
| Files with missing lines | Coverage Δ | |
|---|---|---|
| internal/datacoord/server.go | 68.00% <100.00%> (ø) |
|
| internal/distributed/connection_manager.go | 71.27% <100.00%> (ø) |
|
| internal/querycoordv2/server.go | 76.03% <100.00%> (ø) |
|
| internal/util/sessionutil/session_util.go | 75.59% <100.00%> (ø) |
|
| cmd/tools/migration/migration/runner.go | 0.00% <0.00%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.
- Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
- Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
- For tests that exist in both v1 and v2, passing in either system is considered PASS.
To rerun ci-v2 checks, comment with:
- /ci-rerun-code-check // for ci-v2/code-check
- /ci-rerun-build // for ci-v2/build
- /ci-rerun-ut-integration // for ci-v2/ut-integration
- /ci-rerun-ut-go // for ci-v2/ut-go
- /ci-rerun-ut-cpp // for ci-v2/ut-cpp
- /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
- /ci-rerun-e2e-arm // for ci-v2/e2e-arm
If you have any questions or requests, please contact @zhikunyao.
/ci-rerun-ut-integration
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
/run-cpu-e2e
/ci-rerun-ut-integration
/ci-rerun-ut-integration
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: congqixia, liliu-z
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [congqixia,liliu-z]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment