grpc-go
grpc-go copied to clipboard
grpc: Hold mutex while calling resetTransport to prevent concurrent connection attempts
This change ensures that caller of resetTransport() keep holding the mutex instead of releasing it and having resetTransport() re-acquire it. This ensures that no concurrent requests are able to start once caller of resetTransport does some validation and calls resetTransport.
See https://github.com/grpc/grpc-go/issues/7365#issuecomment-2208258873 for more details.
Tested
Verified that Test/AuthorityRevive no longer flakes for 100000 attempts with the change.
Fixes: https://github.com/grpc/grpc-go/issues/7365
RELEASE NOTES:
- Fix race condition that could lead to multiple transports being created in parallel. Only one transport is closed when the subConn is updated and the other transports are orphaned. The orphaned transports may get closed if the server returns an error.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 81.35%. Comparing base (
daab563) to head (76ef33f).
Additional details and impacted files
@@ Coverage Diff @@
## master #7390 +/- ##
==========================================
- Coverage 81.51% 81.35% -0.17%
==========================================
Files 348 348
Lines 26744 26741 -3
==========================================
- Hits 21801 21754 -47
- Misses 3764 3793 +29
- Partials 1179 1194 +15
| Files | Coverage Δ | |
|---|---|---|
| clientconn.go | 92.40% <100.00%> (-0.99%) |
:arrow_down: |
Release notes needs to be prefixed with "package name:" in this case balancer:?
Release notes needs to be prefixed with ":" in this case
balancer:?
This is a change in the outermost grpc package. Added the package name in the release notes.
is Test/ServerSideXDS_WithValidAndInvalidSecurityConfiguration failure related to change or just a flake?
is
Test/ServerSideXDS_WithValidAndInvalidSecurityConfigurationfailure related to change or just a flake?
It's a flake, there's an existing issue for this and I've commented on the issue about this failure: https://github.com/grpc/grpc-go/issues/6914
RELEASE NOTES:
- Fix race condition that could lead to multiple transports being created in parallel.
What is the user-visible symptom here? A memory leak? Or just an extra connection that was attempted, but that will quickly go away on its own anyway? (Or will the extra connection stick around until the channel is closed?)
What is the user-visible symptom here? A memory leak? Or just an extra connection that was attempted, but that will quickly go away on its own anyway? (Or will the extra connection stick around until the channel is closed?)
What I'm seeing in the test is that only one transport is closed when the subConn is updated (ac.transport) and the other transport is orphaned. The orphaned transports get closed when the server is shutdown at the end of the test. Updated the release notes.