Sysinfo: add support for AMD and NVIDIA GPUs
I added support for AMD and NVIDIA GPUs by using platform specific commands. Tested only on Linux, the implementations for Windows and MacOS require appropriate review.
Walkthrough
Adds end-to-end GPU monitoring: new docs (GPU_MONITORING.md) describe cross-platform GPU metric collection. Backend (pkg/wshrpc/wshremote/sysinfo.go) introduces GpuData and many helper functions to detect platform and collect GPU metrics via nvidia-smi/rocm-smi on Linux, system_profiler/iostat/vm_stat/sysctl on macOS, and PowerShell/WMI/perf counters on Windows; aggregates per-GPU and average GPU metrics into the existing metrics map and calls getGpuData from generateSingleServerData. Frontend adds GPU plot types, per-GPU and per-core CPU metadata, and a GPU color CSS variable. Tests for platform detection and GPU parsing were added. Minor cleanup: removed unused byte-count vars in write calls. Added TimeSeries_Gpu constant.
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | ✅ Passed | The title clearly and concisely summarizes the main change by highlighting the addition of AMD and NVIDIA GPU support in the sysinfo functionality, matching the core content of the changeset without including unnecessary detail. |
| Description Check | ✅ Passed | The description directly relates to the changeset by explaining the addition of GPU support via platform‐specific commands and noting the testing scope, providing relevant context without veering off-topic. |
| Docstring Coverage | ✅ Passed | No functions found in the changes. Docstring coverage check skipped. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
📜 Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 429a80670193ada27a270df063279d992295ee41 and 695f99fca5688cb1865a625db14f6144a21a909d.
📒 Files selected for processing (1)
-
pkg/wshrpc/wshrpctypes.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/wshrpc/wshrpctypes.go
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
@sarumaj sorry about the long delay here. was on a very large branch for new AI features and didn't want to pull other PRs until I got that stabilized.
I'm running on a Mac and definitely having some issues. keep hitting the 5s timeout.
did a bit of digging and the mac route is likely going to require powermetrics and sudo :/
need to mull this over a bit. considering two options. just ship the linux impl and add some code to disable the UI for the other platforms. or try to get the mac version working too with a sudo prompt and the new powermetrics code.
intel_gpu_top and ollama API
A couple data points that might help the sysinfo meter.
intel_gpu_top
I'd really like it if the GPU meter worked especially for the localai scenarios... I noticed you haven't added intel_gpu_top -J yet for an input -J(json) -s1234 (milliseconds) unfortunately the json produced leaves a lot to be desired. However the csv option and making use of something like csv2json Zig based tool you might need to adapt it a bit did not like reading from stdin.
{"Freq MHz req": "299.547183", "Freq MHz act": "127.110482", "IRQ /s": "200.026573", "RC6 %": "33.710977", "Power W gpu": "0.403607", "Power W pkg": "6.170362", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}
{"Freq MHz req": "300.485490", "Freq MHz act": "113.297808", "IRQ /s": "213.788037", "RC6 %": "35.317579", "Power W gpu": "0.361332", "Power W pkg": "4.921782", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}
{"Freq MHz req": "438.640253", "Freq MHz act": "158.343230", "IRQ /s": "247.841578", "RC6 %": "35.365278", "Power W gpu": "0.546255", "Power W pkg": "6.213796", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}
ollama-python
https://github.com/ollama/ollama-python/blob/9ddd5f0182d0c15274e0280154231f8149e8612a/ollama/_types.py#L529C1-L542C1
You get the size_vram / size values for any given model you can at least plot memory utilization and effectively plot busy/not busy with the responses from the API where you can infer that info also...
class ProcessResponse(SubscriptableBaseModel):
class Model(SubscriptableBaseModel):
model: Optional[str] = None
name: Optional[str] = None
digest: Optional[str] = None
expires_at: Optional[datetime] = None
size: Optional[ByteSize] = None
size_vram: Optional[ByteSize] = None
details: Optional[ModelDetails] = None
context_length: Optional[int] = None
models: Sequence[Model]