Draft Pull Request: Feature/Container Metrics
This pull request introduces the initial backend implementation for real-time container metrics, as discussed in issue #26. This is a draft PR to showcase the progress and discuss the current implementation before proceeding with the frontend and further backend refinements.
What's Done:
- Container Metrics Polling:
- Thread-Safe Ring Buffer:
- Last Accessed Time Check:
Discuss
Client disconnect context canceling
On client disconnect, I have implemented context canceling for Master node, but we might have some issues if for example first client started metrics polling, then second client tries to access that containers plot and then goes to the other container to check, metris will not be accumulated for that client if first client stop and will start only when second client try to access. Maybe because of this issue, we might need prefer to not stop polling after client disconnect.
I'm looking forward to your feedback on the current implementation.
Client logic
Initiate metrics polling on stats tab clicking.
4588 if (key == 'Stats') {
4589 cmds._init_metrics_polling();
4590 }
Handle metrics response
5093 if ('Metrics' in notification.Content) {
If metrics received but we didn't receive stats inspector yet, wait
// container.inspect.stats returned after container.metrics
if (state.inspector.content.length == 0) {
state.isLoading = false;
// to make first request as soon as posible
cmds._cancel_metrics_polling();
cmds._init_metrics_polling();
// do not process if no container.inspector.stats loaded
break;
}
After metrics handled as usual and stored inside of inspector. Whenever users clicks on Stats, client receives fully accumulated metrics from the begining.
- I wasn't sure what is correct formating so I formated with this:
however I think something like this might also be the case:npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma none --arrow-parens always
I'm sorry for huge diff 🙏npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma es5 --arrow-parens always - I have tested feature with agents, stop container, restarting container, reloading on <R>
- Update plot colors accordingly to theme change
Todo
- Should we add buffering on client, so we don't overflow client with infinite metrics flow?
- Add environment variable to control frequency of metrics polling both on client and server
Hey Will!đź‘‹
I just wanted to remind about this pr and also summarise stuff which has been done.
Client-side logic
-
Metrics polling is initiated when the user clicks on the Stats tab:
if (key == 'Stats') { cmds._init_metrics_polling(); } -
When the client receives a
Metricsnotification, it checks ifcontainer.inspect.statshas been loaded.-
If not yet loaded (i.e., metrics arrived first), polling is restarted to sync the first data batch as soon as possible.
-
Once inspector data is ready, metrics are processed and stored as part of the
inspectorstate.
-
-
This ensures the user always gets a fully accumulated metrics history when switching to the Stats tab.
-
Implemented polling cancellation and restart logic to prevent overlapping requests.
-
Plot colors now dynamically follow theme changes for a consistent UI experience.
Backend implementation
Architecture
I introduced a new component:
-
ContainerStatsManager— manages per-container metrics collection. -
RingBuffer[T]— a generic, thread-safe circular buffer for efficient metric storage without memory growth.
Each container’s metrics are stored in a bounded ring buffer (size = 3000), overwriting old data automatically to prevent leaks or unbounded memory usage.
Concurrency and safety
All state-modifying operations in ContainerStatsManager and RingBuffer are guarded with RWMutex locks.
Each container can be polled independently in its own goroutine, linked to a session-wide context.Context, so that when the session ends, all related pollers stop cleanly.
Polling workflow
-
When the client sends the
container.metricscommand, the server:-
Validates arguments and checks container state via
ContainerInspect. -
Updates the container’s
lastAccessedtimestamp. -
If polling isn’t active, starts a new goroutine via
PollMetrics(). -
Returns metrics accumulated since the last
Fromindex.
-
-
The poller itself:
-
Fetches data with
client.ContainerStatsOneShot(). -
Computes CPU% and memory% using deltas between current and previous stats.
-
Appends each new
MetricPointto the container’s ring buffer. -
Runs every 3 seconds and stops automatically if:
-
The container has been idle for >30 minutes, or
-
The session’s context is canceled.
-
-
Data structure
type MetricPoint struct {
CpuMetric float64 `json:"cpu"`
MemMetric float64 `json:"mem"`
Timestamp int64 `json:"timestamp"`
}
These are stored per container in a bounded buffer:
ringbuf.NewRingBuffer
Server command addition
Added new case to the command handler:
case "container.metrics":
It handles request parsing, container state checking, poller initialization, and sending a notification with:
{
"Metrics": [...],
"From": <next index>,
"IsRunning": true
}
Errors or inactive containers return an empty metrics array and "IsRunning": false.
Testing
I’ve tested with:
-
Multiple agents and hosts
-
Container stop/restart cycles
-
Page reloads
Todo / Open questions
-
Should we add client-side buffering to prevent overflow in very long-running sessions?
-
Should we add an env var to control metrics polling frequency (client & server)?
Would be great if you could take a look and maybe test it a bit — I’d really appreciate your feedback.
Thanks!
Alan