infra icon indicating copy to clipboard operation
infra copied to clipboard

Add peer-to-peer metrics tracker

Open djeebus opened this issue 3 months ago • 2 comments

This consolidates metrics into a single struct that does a few things:

  • exports metrics to a file called {pid}.json
  • watches for other files and reads their metrics.
  • when handling incoming requests, check full host metrics

It also creates server.Limiter for checking starting and running limits.


[!NOTE] Introduces a shared-state manager that aggregates sandbox allocations across processes via PID JSON files and integrates a limiter to cap starting/running sandboxes, wiring both into server and service info paths.

  • Shared State (peer-to-peer metrics):
    • Add internal/sharedstate with Manager that writes self allocations to {pid}.json, watches directory via fsnotify, and aggregates Allocations across processes.
    • Integrate with sandboxes map via Subscribe; expose TotalAllocated() and TotalRunningCount().
    • Add tests in internal/sharedstate/tracker_test.go.
  • Sandbox start limiting:
    • Add server.Limiter to enforce max running (via featureflags.MaxSandboxesPerNode) and max starting per node; replace semaphore logic in server with sandboxLimiter and error handling.
  • Service info metrics:
    • internal/service/service_info.go: switch from iterating local sandboxes to using sharedstate.Manager for CPU/memory/disk and running counts.
  • Config:
    • Add SHARED_STATE_DIRECTORY, SHARED_STATE_WRITE_INTERVAL, MAX_STARTING_INSTANCES to cfg.Config.
  • Wiring:
    • main.go: instantiate and run sharedstate.Manager; pass to InfoService and server.New; create server.NewLimiter.
    • Minor: rename Google Storage limiter variable for clarity.
  • Dependencies:
    • Add github.com/fsnotify/fsnotify to go.mod/go.sum.

Written by Cursor Bugbot for commit 835ee09fd8a776787cd71b2d053c1197c9c4d975. This will update automatically on new commits. Configure here.

djeebus avatar Oct 10 '25 00:10 djeebus

I'm going to pull a piece out to its own PR, and reopen when that's merged.

djeebus avatar Oct 10 '25 15:10 djeebus