psutil
psutil copied to clipboard
Add GPU stats features
GPU are more and more used in scientific servers. It will be nice to have GPU stats features into PSUtil.
For examples of existing monitoring GPU software for Intel, NVidia or AMD GPU, see the post http://www.rkblog.rk.edu.pl/w/p/monitoring-amd-intel-and-nvidia-graphics-card-usage-under-linux/
Source (in C) already exist for Intel GPU Top: http://anonscm.debian.org/cgit/pkg-xorg/app/intel-gpu-tools.git/tree/tools/intel_gpu_top.c
I will be a very nice feature asked byt Glances' users.
According to this http://askubuntu.com/a/5419 GPU info is not standardized and not retrievable via /proc as we currently do for the CPU stats. A tool like "Intel GPU Top" suggests that that same code probably won't work on other GPU chipsets, and that it would also probably require C headers to be installed separately. In summary, this looks like a world of pain. =) It's probably something which might make sense to develop as a separate stand-alone python lib, but not into psutil.
I'm willing to reopen this to investigate whether there are viable options to implement this at least for nvidia cards as it seems they are the most used in the scientifi community.
NVIDIA already provides a library and has pypi package that provides python2 bindings, at least. https://developer.nvidia.com/nvidia-management-library-nvml
Please add this! It would be a very nice addition to psutil.
Nvidia's official python module is pynvml
For AMD I found this module, pyamdgpuinfo, but it is currently linux only.
Yet another library, but can only get very basic information is gpu-info ... it's also on pypi, but no description there.
Generally one wants some wrapper around nvidia-smi or rocm-smi (AMD). There's also intel's gpu (though less popular)
NVIDIA already provides a library and has pypi package that provides python2 bindings, at least. developer.nvidia.com/nvidia-management-library-nvml
Wrapping around nvidia-smi
or rocm-smi
in a subprocess might be the most feasible approach (if you don't care about performance). The official NVIDIA Python bindings nvidia-ml-py
does not guarantee backward compatibility with old NVIDIA drivers. I reported compatibility concerns on the NVIDIA forum
[PyPI/nvidia-ml-py] Issue Reports for nvidia-ml-py
.
See also:
- https://github.com/wookayin/gpustat/pull/107#issuecomment-893513321
- wookayin/gpustat#143
- XuehaiPan/nvitop#29
- XuehaiPan/nvitop#30
- XuehaiPan/nvitop#13
- NVIDIA/go-nvml#21
- NVIDIA/go-nvml#25
- Syllo/nvtop#107
- Syllo/nvtop#108
From your bug report:
Backward compatibility between driver and binding versions. Since CUDA 11, the definition of nvmlProcessInfo_t adds two new fields gpuInstanceId and computeInstanceId.
[...]
Another breaking change. nvidia-ml-py 11.515.0 (Jan 12, 2022) now even introduces v3 (nvmlDeviceGetComputeRunningProcesses_v3, etc.).
This is concerning. If the C lib breaks compatibility so easily [1], psutil would probably have to use #ifdef nvidia_version_x ... #else ...
clauses all over the place, and that may create problems with the binary wheels that we distribute on PYPI. The system compiling the wheel may have a certain nvidia-lib version supporting functionality "X", but the user installing the psutil wheel may not, and that usually results in "X symbol not found"
error at import time. I faced a similar problem in https://github.com/giampaolo/psutil/pull/1879, which led me to rewrite prlimit()
functionality from C to ctypes for that reason. The fix was a literal "check for existence of X at run time instead of compilation time".
Also, right now we only depend on apt-get install python3-dev
(Debian / Ubuntu) or yum install python3-devel
(RedHat). Adding support for Nvidia GPUs means we'll have to install nvidialib-dev
/ nvidialib-devel
(or whatever they're called). But not all Linux distros will provide a pre-compiled nvidialib-dev
package, so we'll probably want logic in setup.py
to make GPU functionality optional (aka not crash at compile time). But as a I explained above even that may not be enough due to the wheel related issues.
...and then there is Windows.
All of this to say that implementing this in pure C would be hard, which is probably why they decided to use ctypes in pynvml: https://github.com/gpuopenanalytics/pynvml/blob/master/pynvml/nvml.py
In summary: if we'll ever add GPU functionality in psutil we'll probably want to use ctypes. :)
[1] On a personal note: as a Linux user who's been dealing with Nvidia cards/driver issues for over a decade I'm not that surprised. ;)