nvitop
nvitop copied to clipboard
Support for AMD ROCm devices
Issue Type
- Feature implementation
Description
I've implemented ROCm support in nvitop, enabling it to run on AMD GPUs. This feature has been tested on mi50, mi100, and mi210 machines and is confirmed to maintain full functionality for NVIDIA GPUs.
Motivation and Context
Really need nvitop on AMD GPUs.
#74
Testing
Tested on
mi50
mi100
mi210
Images / Videos
(top: nvitop
, bottom-left: rocm-smi
, bottom-right: pytorch code
)
Hi @Junyi-99, thanks for the contribution! Is there any PyPI package that provides the ROCm-SMI bindings like nvidia-ml-py
for the NVIDIA NVML library? Maybe we should ship the ROCm support with:
pip3 install nvitop[rocm]
Oh, I think it's a very good suggestion to ship through nvitop[rocm]
. Currently, there is a ROCm binding, but it is not that functional.
+1 I'd love to have this support, how is the development going?
+1 It would be great to have this for MI300X
trying this now with hf autotrain, AMD Radeon 7900XT Navi31 gfx1100 with pip install git+https://github.com/XuehaiPan/nvitop.git
I still receive the errors:
Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.
Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGetMemoryInfo`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.
@Junyi-99 Would it be possible to use the rocmsmi repo as a submodule instead? Are there any modifications beyond formatting?
Also please note that we're working on migration to AMDSMI and it would be much better long-term to use that :). ROCMSMI will eventually be deprecated.
In fact RDC migrated to amdsmi somewhat recently.
Cheers! -- Dev from SMI team at AMD.
Is there any PyPI package that provides the ROCm-SMI bindings like
nvidia-ml-py
for the NVIDIA NVML library?
@XuehaiPan This is planned for amdsmi :)
some more info.
- You can build and install amdsmi python package fairly easily.
# if on ubuntu get dependencies:
# sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev
git clone https://github.com/ROCm/amdsmi &&
cd amdsmi &&
cmake -B build &&
make -C build -j $(nproc) &&
cd build/py-interface/python_package &&
python3 -m pip install .
Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage
-
amd-smi process
returns some useful info. Here is me running rocm-validation-suite in the background on dual NV21s:
$ amd-smi process
GPU: 0
PROCESS_INFO:
NAME: rvs
PID: 468813
MEMORY_USAGE:
GTT_MEM: 2.1 MB
CPU_MEM: 253.1 MB
VRAM_MEM: 1.1 GB
MEM_USAGE: 1.4 GB
USAGE:
GFX: 0 ns
ENC: 0 ns
GPU: 1
PROCESS_INFO:
NAME: rvs
PID: 468813
MEMORY_USAGE:
GTT_MEM: 2.1 MB
CPU_MEM: 253.1 MB
VRAM_MEM: 1.1 GB
MEM_USAGE: 1.4 GB
USAGE:
GFX: 0 ns
ENC: 0 ns
this works for wsl2?
some more info.
* You can build and install amdsmi python package fairly easily.
# if on ubuntu get dependencies: # sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev git clone https://github.com/ROCm/amdsmi && cd amdsmi && cmake -B build && make -C build -j $(nproc) && cd build/py-interface/python_package && python3 -m pip install .
Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage
* `amd-smi process` returns some useful info. Here is me running [rocm-validation-suite](https://github.com/ROCm/ROCmValidationSuite/) in the background on dual NV21s:
$ amd-smi process GPU: 0 PROCESS_INFO: NAME: rvs PID: 468813 MEMORY_USAGE: GTT_MEM: 2.1 MB CPU_MEM: 253.1 MB VRAM_MEM: 1.1 GB MEM_USAGE: 1.4 GB USAGE: GFX: 0 ns ENC: 0 ns GPU: 1 PROCESS_INFO: NAME: rvs PID: 468813 MEMORY_USAGE: GTT_MEM: 2.1 MB CPU_MEM: 253.1 MB VRAM_MEM: 1.1 GB MEM_USAGE: 1.4 GB USAGE: GFX: 0 ns ENC: 0 ns
@unclemusclez AFAIK - no.
SMI needs access to amdgpu driver.
rule of thumb, if /sys/class/drm/card*/device/gpu_metrics
exists - SMI will work.
@dmitrii-galantsev I'll try it this weekend.