[client] add reset for management backoff
Describe your changes
Reset client management grpc client backoff after successful connected to management API.
Current Situation: If the connection duration exceeds MaxElapsedTime, when the connection is interrupted, the backoff fails immediately due to timeout and does not actually perform a retry.
Issue ticket number and link
Stack
Checklist
- [x] Is it a bug fix
- [ ] Is a typo/documentation fix
- [ ] Is a feature enhancement
- [ ] It is a refactor
- [ ] Created tests that fail without the change (if possible)
By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.
Documentation
Select exactly one:
- [ ] I added/updated documentation for this change
- [ ] Documentation is not needed for this change (explain why)
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Release Notes
-
Bug Fixes
- Enhanced client connection stability and resilience by improving the consistency of retry backoff behavior across all reconnection attempts and stream operations. The backoff strategy state is now properly maintained and reset during successful stream establishment, ensuring more predictable and efficient connection recovery during network disruptions and service interruptions.
✏️ Tip: You can customize this high-level summary in your review settings.
Walkthrough
Introduces a shared backoff instance in GrpcClient.Sync, passes it into handleStream, updates handleStream signature, and resets the backoff on successful stream setup and after receiving events to maintain consistent retry state across stream retries.
Changes
| Cohort / File(s) | Summary |
|---|---|
Grpc client backoff propagation shared/management/client/grpc.go |
Adds a local backOff in GrpcClient.Sync; uses it for the initial retry loop instead of creating fresh instances; passes the backoff into handleStream (signature changed to accept backoff.BackOff); resets the backoff on successful stream establishment and after processing events; replaces prior Retry usage to leverage the shared backoff. |
Estimated code review effort
🎯 2 (Simple) | ⏱️ ~8 minutes
- Single-file change with focused propagation of a backoff instance.
- Pay attention to: updated
handleStreamsignature usages, correct backoff resets timing (on stream setup and after event handling), and unchanged error/control-flow semantics.
Poem
🐇 I carried one backoff down the stream,
Not born anew with every retry's gleam.
We hop, we wait, then hop again—so sly,
A rabbit's rhythm beneath the sky. ✨
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately describes the main change: adding a backoff reset mechanism for the management gRPC client to fix timeout issues. |
| Description check | ✅ Passed | The description includes the required change explanation and bug fix checklist, but omits issue ticket number and documentation decision sections. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
📜 Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 1299d5603c5a00a0eeaf5ea26d7e9e3ee614cc03 and b6500b0bb6441b898b47cf03d74186595bd1efb6.
📒 Files selected for processing (1)
-
shared/management/client/grpc.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
util/retry.go (1)
Retry(10-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Client / Unit (386)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit
- GitHub Check: JS / Lint
- GitHub Check: Darwin
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: release_ui_darwin
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: iOS / Build
- GitHub Check: Client / Unit
🔇 Additional comments (6)
shared/management/client/grpc.go (6)
114-115: LGTM! Clean backoff initialization.Creating and resetting the backoff instance here ensures consistent retry state across stream retries within a single
Synccall. This is the foundation for preventing immediate timeout when connection duration exceeds MaxElapsedTime.
134-134: LGTM! Correct propagation of shared backoff.Passing the backoff instance to
handleStreamenables it to reset the retry state after successful operations, which is essential for the fix.
137-137: LGTM! Consistent use of shared backoff.Using the shared backoff instance in the retry loop ensures consistent retry behavior across attempts.
145-146: Signature change is appropriate and non-breaking.The
handleStreammethod signature now includes thebackOff backoff.BackOffparameter to receive the shared backoff instance. Since this is a private method, and the only caller at line 134 has been correctly updated to pass the new parameter, this change poses no breaking changes.
160-160: Verify intended backoff reset after successful stream connection.Line 160 is marked as changed but appears blank. According to the AI summary and PR objectives, the backoff should be reset after successful stream establishment. While the reset at line 165 handles the retry case, a reset here would provide clearer semantics: "successful connection = reset retry state."
If a reset is intended here, apply this diff:
log.Infof("connected to the Management Service stream") + backOff.Reset() c.notifyConnected()Likely an incorrect or invalid review comment.
164-165: LGTM! This is the key fix for the MaxElapsedTime issue.Resetting the backoff after
receiveEventsreturns ensures that even if the stream was active for longer than MaxElapsedTime (3 months), subsequent connection failures will start with a fresh retry budget rather than timing out immediately. The placement before error checking is correct since:
- Successful reception followed by any error should reset retry state
- Permanent errors will exit the retry loop regardless
- Retryable errors benefit from fresh backoff
The
Reset()method clears the elapsed-time timer, restarting the retry window from zero on each new connection attempt.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code