nvitop Support for AMD ROCm devices

Issue Type

Feature implementation

Description

I've implemented ROCm support in nvitop, enabling it to run on AMD GPUs. This feature has been tested on mi50, mi100, and mi210 machines and is confirmed to maintain full functionality for NVIDIA GPUs.

Motivation and Context

Really need nvitop on AMD GPUs.

#74

Testing

Tested on

mi50

mi100

mi210

Images / Videos

(top: nvitop, bottom-left: rocm-smi, bottom-right: pytorch code)

Mar 11 '24 13:03 Junyi-99

Hi @Junyi-99, thanks for the contribution! Is there any PyPI package that provides the ROCm-SMI bindings like nvidia-ml-py for the NVIDIA NVML library? Maybe we should ship the ROCm support with:

pip3 install nvitop[rocm]

Mar 15 '24 15:03 XuehaiPan

Oh, I think it's a very good suggestion to ship through nvitop[rocm]. Currently, there is a ROCm binding, but it is not that functional.

Mar 15 '24 16:03 Junyi-99

+1 I'd love to have this support, how is the development going?

Aug 08 '24 16:08 hartmark

+1 It would be great to have this for MI300X

Aug 08 '24 19:08 kswain55

trying this now with hf autotrain, AMD Radeon 7900XT Navi31 gfx1100 with pip install git+https://github.com/XuehaiPan/nvitop.git

I still receive the errors:

Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.
Your installed package `nvidia-ml-py` is corrupted. Skip patch functions `nvmlDeviceGetMemoryInfo`. You may get incorrect or incomplete results. Please consider reinstall package `nvidia-ml-py` via `pip3 install --force-reinstall nvidia-ml-py nvitop`.

Aug 22 '24 03:08 unclemusclez

@Junyi-99 Would it be possible to use the rocmsmi repo as a submodule instead? Are there any modifications beyond formatting?

Also please note that we're working on migration to AMDSMI and it would be much better long-term to use that :). ROCMSMI will eventually be deprecated.

In fact RDC migrated to amdsmi somewhat recently.

Cheers! -- Dev from SMI team at AMD.

Sep 03 '24 21:09 dmitrii-galantsev

Is there any PyPI package that provides the ROCm-SMI bindings like nvidia-ml-py for the NVIDIA NVML library?

@XuehaiPan This is planned for amdsmi :)

Sep 03 '24 21:09 dmitrii-galantsev

some more info.

You can build and install amdsmi python package fairly easily.

# if on ubuntu get dependencies:
# sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev
git clone https://github.com/ROCm/amdsmi &&
cd amdsmi &&
cmake -B build &&
make -C build -j $(nproc) &&
cd build/py-interface/python_package &&
python3 -m pip install .

Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage

amd-smi process returns some useful info. Here is me running rocm-validation-suite in the background on dual NV21s:

$ amd-smi process
GPU: 0
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

GPU: 1
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

Sep 03 '24 22:09 dmitrii-galantsev

this works for wsl2?

some more info.

* You can build and install amdsmi python package fairly easily.

# if on ubuntu get dependencies:
# sudo apt install git python3 python3-pip cmake clang build-essential pkg-config libdrm-dev
git clone https://github.com/ROCm/amdsmi &&
cd amdsmi &&
cmake -B build &&
make -C build -j $(nproc) &&
cd build/py-interface/python_package &&
python3 -m pip install .

Now you should be able to use the api: https://github.com/ROCm/amdsmi/tree/amd-staging/py-interface#usage

* `amd-smi process` returns some useful info. Here is me running [rocm-validation-suite](https://github.com/ROCm/ROCmValidationSuite/) in the background on dual NV21s:

$ amd-smi process
GPU: 0
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

GPU: 1
    PROCESS_INFO:
        NAME: rvs
        PID: 468813
        MEMORY_USAGE:
            GTT_MEM: 2.1 MB
            CPU_MEM: 253.1 MB
            VRAM_MEM: 1.1 GB
        MEM_USAGE: 1.4 GB
        USAGE:
            GFX: 0 ns
            ENC: 0 ns

Sep 03 '24 22:09 unclemusclez

@unclemusclez AFAIK - no. SMI needs access to amdgpu driver. rule of thumb, if /sys/class/drm/card*/device/gpu_metrics exists - SMI will work.

Sep 04 '24 14:09 dmitrii-galantsev

@dmitrii-galantsev I'll try it this weekend.

Sep 05 '24 04:09 Junyi-99

nvitop nvitop copied to clipboard

Support for AMD ROCm devices

Issue Type

Description

Motivation and Context

Testing

Images / Videos

nvitop
nvitop copied to clipboard