isaiah Draft Pull Request: Feature/Container Metrics

This pull request introduces the initial backend implementation for real-time container metrics, as discussed in issue #26. This is a draft PR to showcase the progress and discuss the current implementation before proceeding with the frontend and further backend refinements.

What's Done:

Container Metrics Polling:
Thread-Safe Ring Buffer:
Last Accessed Time Check:

Discuss

Client disconnect context canceling

On client disconnect, I have implemented context canceling for Master node, but we might have some issues if for example first client started metrics polling, then second client tries to access that containers plot and then goes to the other container to check, metris will not be accumulated for that client if first client stop and will start only when second client try to access. Maybe because of this issue, we might need prefer to not stop polling after client disconnect.

I'm looking forward to your feedback on the current implementation.

Aug 16 '25 16:08 Lumberj3ck

Client logic

Initiate metrics polling on stats tab clicking.

4588 if (key == 'Stats') {
4589     cmds._init_metrics_polling();
4590   }

Handle metrics response

5093 if ('Metrics' in notification.Content) {

If metrics received but we didn't receive stats inspector yet, wait

      // container.inspect.stats returned after container.metrics
      if (state.inspector.content.length == 0) {
        state.isLoading = false;
        // to make first request as soon as posible
        cmds._cancel_metrics_polling();
        cmds._init_metrics_polling();
        // do not process if no container.inspector.stats loaded
        break;
      }

After metrics handled as usual and stored inside of inspector. Whenever users clicks on Stats, client receives fully accumulated metrics from the begining.

I wasn't sure what is correct formating so I formated with this:

npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma none --arrow-parens always

however I think something like this might also be the case:

npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma es5 --arrow-parens always

I'm sorry for huge diff 🙏

I have tested feature with agents, stop container, restarting container, reloading on <R>
Update plot colors accordingly to theme change

Todo

Should we add buffering on client, so we don't overflow client with infinite metrics flow?
Add environment variable to control frequency of metrics polling both on client and server

Sep 20 '25 10:09 Lumberj3ck

Hey Will!👋

I just wanted to remind about this pr and also summarise stuff which has been done.

Client-side logic

Metrics polling is initiated when the user clicks on the Stats tab:
```
if (key == 'Stats') {
    cmds._init_metrics_polling();
}
```
When the client receives a Metrics notification, it checks if container.inspect.stats has been loaded.
- If not yet loaded (i.e., metrics arrived first), polling is restarted to sync the first data batch as soon as possible.
- Once inspector data is ready, metrics are processed and stored as part of the inspector state.
This ensures the user always gets a fully accumulated metrics history when switching to the Stats tab.
Implemented polling cancellation and restart logic to prevent overlapping requests.
Plot colors now dynamically follow theme changes for a consistent UI experience.

Backend implementation

Architecture

I introduced a new component:

ContainerStatsManager — manages per-container metrics collection.
RingBuffer[T] — a generic, thread-safe circular buffer for efficient metric storage without memory growth.

Each container’s metrics are stored in a bounded ring buffer (size = 3000), overwriting old data automatically to prevent leaks or unbounded memory usage.

Concurrency and safety

All state-modifying operations in ContainerStatsManager and RingBuffer are guarded with RWMutex locks.
Each container can be polled independently in its own goroutine, linked to a session-wide context.Context, so that when the session ends, all related pollers stop cleanly.

Polling workflow

When the client sends the container.metrics command, the server:
1. Validates arguments and checks container state via ContainerInspect.
2. Updates the container’s lastAccessed timestamp.
3. If polling isn’t active, starts a new goroutine via PollMetrics().
4. Returns metrics accumulated since the last From index.
The poller itself:
- Fetches data with client.ContainerStatsOneShot().
- Computes CPU% and memory% using deltas between current and previous stats.
- Appends each new MetricPoint to the container’s ring buffer.
- Runs every 3 seconds and stops automatically if:
  - The container has been idle for >30 minutes, or
  - The session’s context is canceled.

Data structure

type MetricPoint struct {
	CpuMetric float64 `json:"cpu"`
	MemMetric float64 `json:"mem"`
	Timestamp int64   `json:"timestamp"`
}

These are stored per container in a bounded buffer:

ringbuf.NewRingBuffer

Server command addition

Added new case to the command handler:

case "container.metrics":

It handles request parsing, container state checking, poller initialization, and sending a notification with:

{
  "Metrics": [...],
  "From":  <next index>,
  "IsRunning": true  
}

Errors or inactive containers return an empty metrics array and "IsRunning": false.

Testing

I’ve tested with:

Multiple agents and hosts
Container stop/restart cycles
Page reloads

Todo / Open questions

Should we add client-side buffering to prevent overflow in very long-running sessions?
Should we add an env var to control metrics polling frequency (client & server)?

Would be great if you could take a look and maybe test it a bit — I’d really appreciate your feedback.

Thanks!
Alan

Oct 09 '25 11:10 Lumberj3ck