psutil Add GPU stats features

GPU are more and more used in scientific servers. It will be nice to have GPU stats features into PSUtil.

For examples of existing monitoring GPU software for Intel, NVidia or AMD GPU, see the post http://www.rkblog.rk.edu.pl/w/p/monitoring-amd-intel-and-nvidia-graphics-card-usage-under-linux/

Source (in C) already exist for Intel GPU Top: http://anonscm.debian.org/cgit/pkg-xorg/app/intel-gpu-tools.git/tree/tools/intel_gpu_top.c

I will be a very nice feature asked byt Glances' users.

Aug 02 '14 16:08 nicolargo

According to this http://askubuntu.com/a/5419 GPU info is not standardized and not retrievable via /proc as we currently do for the CPU stats. A tool like "Intel GPU Top" suggests that that same code probably won't work on other GPU chipsets, and that it would also probably require C headers to be installed separately. In summary, this looks like a world of pain. =) It's probably something which might make sense to develop as a separate stand-alone python lib, but not into psutil.

Aug 08 '14 15:08 giampaolo

I'm willing to reopen this to investigate whether there are viable options to implement this at least for nvidia cards as it seems they are the most used in the scientifi community.

Sep 05 '17 03:09 giampaolo

NVIDIA already provides a library and has pypi package that provides python2 bindings, at least. https://developer.nvidia.com/nvidia-management-library-nvml

Mar 13 '20 01:03 Gerardwx

Please add this! It would be a very nice addition to psutil.

Nvidia's official python module is pynvml

For AMD I found this module, pyamdgpuinfo, but it is currently linux only.

Yet another library, but can only get very basic information is gpu-info ... it's also on pypi, but no description there.

Apr 27 '22 17:04 ReenigneArcher

Generally one wants some wrapper around nvidia-smi or rocm-smi (AMD). There's also intel's gpu (though less popular)

Feb 09 '23 05:02 DanielWicz

NVIDIA already provides a library and has pypi package that provides python2 bindings, at least. developer.nvidia.com/nvidia-management-library-nvml

Wrapping around nvidia-smi or rocm-smi in a subprocess might be the most feasible approach (if you don't care about performance). The official NVIDIA Python bindings nvidia-ml-py does not guarantee backward compatibility with old NVIDIA drivers. I reported compatibility concerns on the NVIDIA forum [PyPI/nvidia-ml-py] Issue Reports for nvidia-ml-py.

See also:

https://github.com/wookayin/gpustat/pull/107#issuecomment-893513321
wookayin/gpustat#143
XuehaiPan/nvitop#29
XuehaiPan/nvitop#30
XuehaiPan/nvitop#13
NVIDIA/go-nvml#21
NVIDIA/go-nvml#25
Syllo/nvtop#107
Syllo/nvtop#108

Apr 16 '23 08:04 XuehaiPan

From your bug report:

Backward compatibility between driver and binding versions. Since CUDA 11, the definition of nvmlProcessInfo_t adds two new fields gpuInstanceId and computeInstanceId.

[...]

Another breaking change. nvidia-ml-py 11.515.0 (Jan 12, 2022) now even introduces v3 (nvmlDeviceGetComputeRunningProcesses_v3, etc.).

This is concerning. If the C lib breaks compatibility so easily [1], psutil would probably have to use #ifdef nvidia_version_x ... #else ... clauses all over the place, and that may create problems with the binary wheels that we distribute on PYPI. The system compiling the wheel may have a certain nvidia-lib version supporting functionality "X", but the user installing the psutil wheel may not, and that usually results in "X symbol not found" error at import time. I faced a similar problem in https://github.com/giampaolo/psutil/pull/1879, which led me to rewrite prlimit() functionality from C to ctypes for that reason. The fix was a literal "check for existence of X at run time instead of compilation time".

Also, right now we only depend on apt-get install python3-dev (Debian / Ubuntu) or yum install python3-devel (RedHat). Adding support for Nvidia GPUs means we'll have to install nvidialib-dev / nvidialib-devel (or whatever they're called). But not all Linux distros will provide a pre-compiled nvidialib-dev package, so we'll probably want logic in setup.py to make GPU functionality optional (aka not crash at compile time). But as a I explained above even that may not be enough due to the wheel related issues.

...and then there is Windows.

All of this to say that implementing this in pure C would be hard, which is probably why they decided to use ctypes in pynvml: https://github.com/gpuopenanalytics/pynvml/blob/master/pynvml/nvml.py

In summary: if we'll ever add GPU functionality in psutil we'll probably want to use ctypes. :)

[1] On a personal note: as a Linux user who's been dealing with Nvidia cards/driver issues for over a decade I'm not that surprised. ;)

Apr 16 '23 09:04 giampaolo

psutil psutil copied to clipboard

Add GPU stats features

psutil
psutil copied to clipboard