gpustat
gpustat copied to clipboard
Faile to run ``gpustat --debug'': pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found
Hi,
On Ubuntu 20.04 with Python 3.8.3, failed to run ``gpustat --debug'', see following for more info:
$ gpustat --debug
Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 644, in _LoadNvmlLibrary
nvmlLib = CDLL("libnvidia-ml.so.1")
File "/home/werner/.pyenv/versions/3.8.3/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/__main__.py", line 19, in print_gpustat
gpu_stats = GPUStatCollection.new_query()
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/core.py", line 281, in new_query
N.nvmlInit()
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 608, in nvmlInit
_LoadNvmlLibrary()
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 646, in _LoadNvmlLibrary
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 310, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found
What's the output of nvidia-smi
I don't install any nvida relevant drivers/tools/utlities on the machine, so the nvidia-smi command is not available currently.
Unfortunately this tool (as well as any other GPU management tools) depends on the Nvidia driver and toolkit, you should at least install the Nvidia-driver to get things work.
On Aug 25, 2020, at 6:05 PM, hongyi-zhao [email protected] wrote:
I don't install any nvida relevant drivers/tools/utlities on the machine, so the nvidia-smi command is not available currently.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wookayin/gpustat/issues/90#issuecomment-679931967, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCYKDFIACPDCSDH6GG6PBDSCOEGXANCNFSM4QKHLAYQ.
Thanks a lot for your explanations. I'll try and feedback if necessary.
What's the output of
nvidia-smi
I had the same issue, this is the output of
nvidia-smi
@Stonesjtu
The problem has been solved. The reason is that I don't have a correct installation of cuda/nvidia-driver. Now, it works smoothly. See the following for details:
$ gpustat --debug
X10DAi-01 Fri Aug 28 15:15:31 2020 450.51.06
[0] GeForce RTX 2070 SUPER | 41'C, 5 % | 291 / 7977 MB | gdm(35M) werner(132M) werner(111M)
$ nvidia-smi
Fri Aug 28 15:15:43 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... On | 00000000:02:00.0 On | N/A |
| 30% 41C P8 17W / 215W | 294MiB / 7977MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1933 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 2631 G /usr/lib/xorg/Xorg 135MiB |
| 0 N/A N/A 4075 G /usr/bin/gnome-shell 111MiB |
+-----------------------------------------------------------------------------+
@radhikasethi2011 Your problem seems like windows compatibility issue from PyNVML. I don't have any windows GPU server by my side, so I'm afraid I cannot fix it my self. But can you take a look at this link (https://forum.faceswap.dev/viewtopic.php?t=14)?
@wookayin How do you think adding a windows support section in the documentation.
It is indeed a good datapoint where nvidia-smi works but PyNVML cannot load the shared library (first time seeing this) in Windows (@radhikasethi2011's case). On Ubuntu it was probably fine (@hongyi-zhao's case). Not sure why, but the link you posted says:
The most likely issue for this is that you have Windows drivers installed through Windows Update/Windows Store.
So we should provide an instruction saying that the drivers should be obtained from the Nvidia website. @radhikasethi2011, Can you confirm this is the case for yours and whether this solves your issue?
I will add some notes in the README, and more informative error messages (which should be shipped from the next release though).
In another issue #86, @eusoubrasileiro used a workaround of copying nvml.dll
from Windows\System32
to site-packages folder. This would be somewhat python-path-related problem and only a quickfix, but hope it helps.
@Stonesjtu @wookayin updated my nvidia driver but nothing changed. Will uninstall and install again from the nvidia website and update here soon.
Did you mean you updated your driver through windows installer?
@wookayin no, through the nvidia website. Will try the workaround
This was my solution hope it helps someone:
pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.
If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside
The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.
Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.
This was my solution hope it helps someone:
pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.
If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside
The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.
Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.
Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.
This was my solution hope it helps someone:
pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.
If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside
The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.
Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.
This worked perfectly, thank you !
This was my solution hope it helps someone: pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location. If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one. Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.
Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.
This works for me with a slight change: The location of nvml.dll is now in C:\Windows\System32\DriverStore\FileRepository\nvrzui.inf_amd64_8df10ddaac270452
You can solve this issue as belows:
- Search "nvml.dll" file in "C:\Windows\System32\DriverStore\FileRepository"
- Copy "nvml.dll" file to "C:\Program Files\NVIDIA Corporation\NVSMI" (Make NVSMI folder if not in there by yourself)
- Done
Let me close this issue now, now that we have v1.0 released. I believe the new version of pynvml should have no problem, but if anyone runs into a similar issue on Windows, please create a new issue. Thanks.