netbird icon indicating copy to clipboard operation
netbird copied to clipboard

[client] auto-update

Open pappz opened this issue 2 months ago • 4 comments

Describe your changes

  • Only execute the update when there is human interaction with the "Connect" button in the UI.
  • Run the installation process from a detached binary

Issue ticket number and link

Stack

Checklist

  • [x] Is it a bug fix
  • [ ] Is a typo/documentation fix
  • [x] Is a feature enhancement
  • [x] It is a refactor
  • [ ] Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • [ ] I added/updated documentation for this change
  • [x] Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Background auto-update system with per-account/version controls, in-app update progress UI and tray/menu triggers
    • Cross‑platform updater/installer with installer result reporting exposed via the daemon/UI
    • New signer CLI for creating, bundling, signing, and verifying artifact keys and revocation lists
  • Documentation

    • Added docs for updater/installer and signing/verification workflows; embedded public keys for verification
  • Other

    • Debug bundles now include updater logs; various updater/downloader/verification utilities and tests added

✏️ Tip: You can customize this high-level summary in your review settings.

pappz avatar Oct 31 '25 12:10 pappz

Walkthrough

Threads a doInitialAutoUpdate flag through client/server/connect/engine, adds an updatemanager with installer/downloader/result plumbing, implements reposign signing/revocation and CLI tooling, extends UI/proto/API for installer-result polling and auto-update settings, and adds platform-specific installer logic and tests.

Changes

Cohort / File(s) Summary
Connect & Engine wiring
client/internal/connect.go, client/internal/engine.go, client/internal/engine_test.go
Add doInitialAutoUpdate flag to ConnectClient; change NewConnectClient and NewEngine signatures to accept the flag/stateManager; wire stateManager and updatemanager into engine lifecycle and initial auto-update handling.
Client call sites & embedding
client/android/client.go, client/embed/embed.go, client/ios/NetBirdSDK/client.go, client/cmd/up.go, client/cmd/root.go
Update calls to NewConnectClient(..., false); add CLI pre-check for updater binary; add update command scaffolding and platform-specific update handling.
Server & RPC
client/server/server.go, client/server/server_test.go, client/server/updateresult.go, client/proto/daemon.proto
Thread doInitialAutoUpdate through server connect flows; add UpRequest.autoUpdate and GetInstallerResult RPC; implement server-side installer result handler using ResultHandler.
Auto-update manager & installer
client/internal/updatemanager/*, client/internal/updatemanager/installer/*, client/internal/updatemanager/downloader/*
New updatemanager package: Manager, UpdateState, lifecycle API (Start/Stop/SetVersion/CheckUpdateSuccess), platform-specific Installer implementations and unsupported stubs, downloader utilities, installer result handling (ResultHandler), procattr helpers, and docs.
Reposign (signing & revocation)
client/internal/updatemanager/reposign/*
New reposign package: key types (KeyID, PublicKey/PrivateKey), RootKey/ArtifactKey, signature format, artifact signing/validation, revocation lists, bundling/verification, embedded certs, ArtifactVerify, and extensive tests.
CLI signer & update commands
client/cmd/signer/*, client/cmd/signer/main.go, client/cmd/update.go, client/cmd/update_supported.go
Add signer CLI (create/bundle/verify artifact keys, revocation commands) and update command wiring with platform detection, logging, and flags.
UI integration
client/ui/client_ui.go, client/ui/event_handler.go, client/ui/profile.go, client/ui/quickactions.go, client/ui/update.go, client/ui/update_*.go
UI wiring for updates: menuUpClick gains boolean arg, runSelfCommand becomes variadic, update progress window with installer-result polling and error mapping, new UI fields/flags, event handlers, and platform-specific UI-kill helpers.
Version & update helper
version/update.go, version/update_test.go
Add NewUpdateAndStart, StartFetcher, LatestVersion() and nil-safe checks; tests updated to use NewUpdateAndStart.
Management API & types
management/*, shared/management/proto/management.proto, shared/management/http/api/*, shared/management/http/api/types.gen.go, management/server/types/settings.go
Add AutoUpdateSettings/AutoUpdateVersion across proto, API and types; validate "latest"/"disabled"/semver; emit account event on version change; update handlers and tests.
Debug, scripts & misc
client/internal/debug/debug.go, client/proto/generate.sh, client/internal/profilemanager/config.go, client/netbird.wxs
Include updater logs in debug bundles, restore final cd in proto generator, improve Darwin config dir handling and permissions, and adjust MSI WXS behavior.
Tests & other updates
various *_test.go files
Add/update tests covering reposign, updatemanager, downloader, installer result handling, engine/server signature changes, and update-related behavior.
Embedding & build tags
client/internal/updatemanager/reposign/embed_*.go, client/cmd/update*.go, client/internal/updatemanager/installer/*_nowindows.go, client/internal/updatemanager/installer/*_darwin.go, client/internal/updatemanager/installer/*_windows.go
Build-tagged embedded certs and platform-specific installer files for Windows/Darwin/non-Windows logic.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant UI
    participant Service as serviceClient
    participant Daemon
    participant Connect as ConnectClient
    participant Engine
    participant UpdateMgr as UpdateManager
    participant Installer
    participant State as StateManager

    UI->>Service: menuUpClick(ctx, wannaAutoUpdate)
    Service->>Daemon: Up(AutoUpdate flag)
    Daemon->>Daemon: connectWithRetryRuns(doInitialAutoUpdate)
    Daemon->>Connect: NewConnectClient(..., doInitialAutoUpdate)
    Connect->>Engine: NewEngine(..., stateManager)
    Engine->>UpdateMgr: NewManager(statusRecorder, stateManager)
    alt doInitialAutoUpdate == true
        UpdateMgr->>Installer: RunInstallation(targetVersion)
        Installer->>OS: start detached updater process
        OS-->>Installer: writes result.json via ResultHandler
        Daemon->>Installer: ResultHandler.Watch -> result
        Daemon->>Service: installer result returned
        Service->>UI: update progress window / close on success
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas for deeper review:
    • reposign cryptography (PEM formats, KeyID derivation, signature/timestamp semantics, revocation validation).
    • Installer platform implementations (detached process attributes, Setup flows, atomic result writes, cleanup).
    • Engine/connect lifecycle and concurrency around doInitialAutoUpdate propagation.
    • ResultHandler.Watch logic and fsnotify usage/error handling.
    • UI update progress polling, timeouts, and error mapping.

Possibly related PRs

  • netbirdio/netbird#4717 — overlaps UI quick-actions/menuUpClick and runSelfCommand signature changes.
  • netbirdio/netbird#4592 — modifies NewConnectClient/engine wiring and related call sites.

Suggested reviewers

  • mlsmaycon
  • crn4

Poem

🐰 I hopped through keys and installer tracks,

Bundled pubkeys, signed timestamps, and packed stacks,
Dots on the UI while a watcher hums,
State files whisper as the updater runs,
Carrot-coded changes — hop, merge, and back!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.91% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[client] auto-update' is concise and directly describes the main feature being added to the client codebase.
Description check ✅ Passed The description covers key changes and checks most required template sections, but lacks an issue ticket number/link and contains a placeholder docs PR URL.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch feat/auto-upgrade-fork

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 07 '25 15:11 coderabbitai[bot]

@coderabbitai review this PR

pappz avatar Nov 12 '25 16:11 pappz

[!TIP] For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the list of files changed in the PR
echo "=== Files changed in PR #4732 ==="
gh pr view 4732 --repo netbirdio/netbird --json files --jq '.files[] | .path'

echo -e "\n=== PR diff ==="
# Get the full diff of the PR
gh pr diff 4732 --repo netbirdio/netbird

Length of output: 264151


@pappz I'll review this PR for you. This is a substantial feature addition implementing client auto-update functionality. Let me analyze the key aspects:

Architecture Overview

The PR implements a two-process architecture for self-updates:

  1. Service Process: Initiates updates via RunInstallation()
  2. Updater Process: Detached child that performs actual installation and restarts services

This separation is well-designed to handle the limitation that a service cannot update itself while running.

Key Strengths

Strong Cryptographic Foundation: The reposign package implements a robust two-tier key hierarchy (root keys → artifact keys) with Ed25519 signatures and BLAKE2s hashing

Security Features:

  • Timestamp binding prevents replay attacks
  • Key expiration and revocation support
  • Signature verification before installation
  • Multiple root key support for key rotation

Good Documentation: Comprehensive package documentation in doc.go files

Result Communication: Clean IPC mechanism via JSON files between updater and service

Concerns & Recommendations

1. Security: Missing Signature Verification Before Execution

The RunInstallation() downloads and verifies signatures, but I don't see explicit verification that the downloaded installer matches the signature before the updater binary is copied and launched. The verification happens in Setup() which runs in the updater process.

Recommendation: Move artifact verification to happen in the service process before copying the updater binary.

2. Error Handling: Silent Failures

In client/cmd/root.go:

func Execute() error {
	if isUpdateBinary() {
		return updateCmd.Execute()
	}
	return rootCmd.Execute()
}

If updateCmd.Execute() fails, the error is returned but there's no logging. Users won't know why the update failed.

Recommendation: Add logging before returning errors from update execution.

3. Resource Leak: Missing Cleanup

In installer_run_darwin.go, the startUIAsUser() function:

launchCmd := exec.Command("launchctl", "asuser", userInfo.Uid, "open", "-a", uiBinary)
if err := launchCmd.Start(); err != nil {
    return fmt.Errorf("failed to start UI process: %w", err)
}
if err := launchCmd.Process.Release(); err != nil {
    log.Warnf("failed to release UI process: %v", err)
}

The command could fail after Start() but before Release(), leaving the process handle unreleased.

Recommendation: Use defer for cleanup:

if err := launchCmd.Start(); err != nil {
    return fmt.Errorf("failed to start UI process: %w", err)
}
defer func() {
    if err := launchCmd.Process.Release(); err != nil {
        log.Warnf("failed to release UI process: %v", err)
    }
}()

4. Race Condition: Update State Management

In manager.go, the update state is persisted but there's a window between checking the result and deleting the state where a crash could leave stale state:

func (m *Manager) CheckUpdateSuccess(ctx context.Context) {
    reason := m.lastResultErrReason()
    if reason != "" {
        // Publish error event
    }
    
    updateState, err := m.loadAndDeleteUpdateState(ctx)
    // If process crashes here, state is deleted but success event not published

Recommendation: Delete state only after successfully publishing events.

5. Platform-Specific: macOS Homebrew Handling

The Homebrew update path sends SIGTERM to the current process which may not guarantee a clean shutdown if there are active connections:

err = process.Signal(syscall.SIGTERM)

Recommendation: Consider implementing graceful shutdown with a timeout, similar to how Docker handles SIGTERM + SIGKILL.

6. Testing: Limited Integration Tests

I see unit tests for the signing system but no integration tests for the full update flow. The update process involves multiple steps across processes.

Recommendation: Add integration tests that verify:

  • Full update cycle from check → download → install → verify
  • Failure recovery scenarios
  • Service restart behavior

7. Observability: Missing Metrics

There's no telemetry for update operations. Consider tracking:

  • Update attempts vs. successes
  • Time to complete updates
  • Failure reasons
  • Version distributions

8. Documentation: Installation Flow Unclear

The installer_common.go comment says:

// updateCmd.SysProcAttr = &syscall.SysProcAttr{
//     CreationFlags: syscall.CREATE_NEW_PROCESS_GROUP | 0x00000008,
// }

This commented code suggests incomplete Windows process detachment. Is this intentional?

Recommendation: Either implement properly or document why it's not needed.

Minor Issues

  • Hardcoded paths: /var/lib/netbird/tmp-install should be configurable
  • Magic numbers: 5 * time.Minute rate limiting could be a constant
  • Error messages: Some errors don't include enough context for debugging
  • Missing null checks: updateManager is checked for nil in some places but not others

Verdict

This is well-architected code with strong security foundations, but it needs refinement in error handling, resource management, and testing before

merging. The cryptographic implementation is solid, but the update orchestration layer needs hardening.

Suggested Priority:

  1. 🔴 High: Fix signature verification timing (Security)
  2. 🔴 High: Add integration tests (Reliability)
  3. 🟡 Medium: Improve error handling and logging
  4. 🟡 Medium: Fix resource leaks
  5. 🟢 Low: Add observability metrics

Would you like me to focus on any specific area in more detail?

coderabbitai[bot] avatar Nov 12 '25 16:11 coderabbitai[bot]