waveterm icon indicating copy to clipboard operation
waveterm copied to clipboard

Sysinfo: add support for AMD and NVIDIA GPUs

Open sarumaj opened this issue 6 months ago • 4 comments

I added support for AMD and NVIDIA GPUs by using platform specific commands. Tested only on Linux, the implementations for Windows and MacOS require appropriate review.

sarumaj avatar Aug 05 '25 18:08 sarumaj

Walkthrough

Adds end-to-end GPU monitoring: new docs (GPU_MONITORING.md) describe cross-platform GPU metric collection. Backend (pkg/wshrpc/wshremote/sysinfo.go) introduces GpuData and many helper functions to detect platform and collect GPU metrics via nvidia-smi/rocm-smi on Linux, system_profiler/iostat/vm_stat/sysctl on macOS, and PowerShell/WMI/perf counters on Windows; aggregates per-GPU and average GPU metrics into the existing metrics map and calls getGpuData from generateSingleServerData. Frontend adds GPU plot types, per-GPU and per-core CPU metadata, and a GPU color CSS variable. Tests for platform detection and GPU parsing were added. Minor cleanup: removed unused byte-count vars in write calls. Added TimeSeries_Gpu constant.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely summarizes the main change by highlighting the addition of AMD and NVIDIA GPU support in the sysinfo functionality, matching the core content of the changeset without including unnecessary detail.
Description Check ✅ Passed The description directly relates to the changeset by explaining the addition of GPU support via platform‐specific commands and noting the testing scope, providing relevant context without veering off-topic.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 429a80670193ada27a270df063279d992295ee41 and 695f99fca5688cb1865a625db14f6144a21a909d.

📒 Files selected for processing (1)
  • pkg/wshrpc/wshrpctypes.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/wshrpc/wshrpctypes.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Aug 05 '25 18:08 coderabbitai[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 05 '25 18:08 CLAassistant

@sarumaj sorry about the long delay here. was on a very large branch for new AI features and didn't want to pull other PRs until I got that stabilized.

I'm running on a Mac and definitely having some issues. keep hitting the 5s timeout.

did a bit of digging and the mac route is likely going to require powermetrics and sudo :/

need to mull this over a bit. considering two options. just ship the linux impl and add some code to disable the UI for the other platforms. or try to get the mac version working too with a sudo prompt and the new powermetrics code.

sawka avatar Oct 08 '25 21:10 sawka

intel_gpu_top and ollama API

A couple data points that might help the sysinfo meter.

intel_gpu_top

I'd really like it if the GPU meter worked especially for the localai scenarios... I noticed you haven't added intel_gpu_top -J yet for an input -J(json) -s1234 (milliseconds) unfortunately the json produced leaves a lot to be desired. However the csv option and making use of something like csv2json Zig based tool you might need to adapt it a bit did not like reading from stdin.

{"Freq MHz req": "299.547183", "Freq MHz act": "127.110482", "IRQ /s": "200.026573", "RC6 %": "33.710977", "Power W gpu": "0.403607", "Power W pkg": "6.170362", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}
{"Freq MHz req": "300.485490", "Freq MHz act": "113.297808", "IRQ /s": "213.788037", "RC6 %": "35.317579", "Power W gpu": "0.361332", "Power W pkg": "4.921782", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}
{"Freq MHz req": "438.640253", "Freq MHz act": "158.343230", "IRQ /s": "247.841578", "RC6 %": "35.365278", "Power W gpu": "0.546255", "Power W pkg": "6.213796", "RCS %": "0.000000", "RCS se": "0.000000", "RCS wa": "0.000000", "BCS %": "0.000000", "BCS se": "0.000000", "BCS wa": "0.000000", "VCS %": "0.000000", "VCS se": "0.000000", "VCS wa": "0.000000", "VECS %": "0.000000", "VECS se": "0.000000", "VECS wa": "0.000000"}

ollama-python

https://github.com/ollama/ollama-python/blob/9ddd5f0182d0c15274e0280154231f8149e8612a/ollama/_types.py#L529C1-L542C1

You get the size_vram / size values for any given model you can at least plot memory utilization and effectively plot busy/not busy with the responses from the API where you can infer that info also...

class ProcessResponse(SubscriptableBaseModel):
  class Model(SubscriptableBaseModel):
    model: Optional[str] = None
    name: Optional[str] = None
    digest: Optional[str] = None
    expires_at: Optional[datetime] = None
    size: Optional[ByteSize] = None
    size_vram: Optional[ByteSize] = None
    details: Optional[ModelDetails] = None
    context_length: Optional[int] = None

  models: Sequence[Model]

K2 avatar Nov 03 '25 23:11 K2