netbird icon indicating copy to clipboard operation
netbird copied to clipboard

[relay-server] Add health-check agent recognition to avoid error logs

Open pappz opened this issue 3 weeks ago • 2 comments

Health-check connections now send a properly formatted auth message with a well-known peer ID instead of immediately closing. The server recognizes this peer ID and handles the connection gracefully with a debug log instead of error logs.

Changes:

  • Add relay/healthcheck/peerid package with health-check peer ID and dummy auth token
  • Update health-check client to send auth message before closing
  • Return peer ID from handshake even on validation failure
  • Check for health-check agent in relay Accept() to avoid error logs

Describe your changes

Issue ticket number and link

Stack

Checklist

  • [x] Is it a bug fix
  • [ ] Is a typo/documentation fix
  • [ ] Is a feature enhancement
  • [ ] It is a refactor
  • [ ] Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • [ ] I added/updated documentation for this change
  • [x] Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Implemented specialized health check functionality for relay connections with dedicated identification and authentication mechanisms.
    • Enhanced WebSocket connection initialization with improved authentication message handling.
  • Bug Fixes

    • Improved relay connection handshake error handling to preserve and return more accurate peer identification diagnostics during failures.

✏️ Tip: You can customize this high-level summary in your review settings.

pappz avatar Dec 05 '25 11:12 pappz

Walkthrough

This PR introduces health check support for the relay server by adding peer identification, authentication tokens, and conditional error logging for health check connections across the WebSocket and handshake layers.

Changes

Cohort / File(s) Summary
Health Check Infrastructure
relay/healthcheck/peerid/peerid.go
New module introducing HealthCheckPeerID (precomputed PeerID hash), DummyAuthToken (structurally valid auth token), createDummyToken helper (HMAC-SHA256 token with zeroed signature and "healthcheck" payload), and IsHealthCheck() for peer identification.
Health Check Integration
relay/healthcheck/ws.go, relay/server/handshake.go, relay/server/relay.go
Modified WebSocket initialization to send authentication message immediately after connection; updated handshake return signatures to include peerID on errors; added health-check conditional logging in relay accept logic to distinguish health-check failures from regular errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Key areas requiring attention:
    • Verify HealthCheckPeerID computation correctness and immutability
    • Confirm DummyAuthToken is structurally valid but not cryptographically verified
    • Ensure auth message marshaling/writing errors are handled without leaving dangling connections
    • Validate that modified handshake return signatures maintain backward compatibility with error handling paths
    • Check that health-check conditional in relay.Accept doesn't mask legitimate errors

Suggested reviewers

  • mlsmaycon

Poem

🐰 A health check hops through the relay gate, With dummy tokens to authenticate, PeerIDs precomputed, swift and true, WebSockets greet them with messages anew, Debug logs whisper when health checks arrive!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding health-check agent recognition to prevent error logs in the relay server.
Description check ✅ Passed The description follows the template structure with completed sections: changes summary, bug fix checklist selection, and documentation rationale. All critical sections are filled.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch fix/handle-healthcheck-errs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Dec 05 '25 11:12 coderabbitai[bot]