gpustat icon indicating copy to clipboard operation
gpustat copied to clipboard

Faile to run ``gpustat --debug'': pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

Open hongyi-zhao opened this issue 3 years ago • 17 comments

Hi,

On Ubuntu 20.04 with Python 3.8.3, failed to run ``gpustat --debug'', see following for more info:

$ gpustat --debug
Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 644, in _LoadNvmlLibrary
    nvmlLib = CDLL("libnvidia-ml.so.1")
  File "/home/werner/.pyenv/versions/3.8.3/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/__main__.py", line 19, in print_gpustat
    gpu_stats = GPUStatCollection.new_query()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/core.py", line 281, in new_query
    N.nvmlInit()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 608, in nvmlInit
    _LoadNvmlLibrary()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 646, in _LoadNvmlLibrary
    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 310, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

hongyi-zhao avatar Aug 25 '20 05:08 hongyi-zhao

What's the output of nvidia-smi

Stonesjtu avatar Aug 25 '20 05:08 Stonesjtu

I don't install any nvida relevant drivers/tools/utlities on the machine, so the nvidia-smi command is not available currently.

hongyi-zhao avatar Aug 25 '20 10:08 hongyi-zhao

Unfortunately this tool (as well as any other GPU management tools) depends on the Nvidia driver and toolkit, you should at least install the Nvidia-driver to get things work.

On Aug 25, 2020, at 6:05 PM, hongyi-zhao [email protected] wrote:

I don't install any nvida relevant drivers/tools/utlities on the machine, so the nvidia-smi command is not available currently.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wookayin/gpustat/issues/90#issuecomment-679931967, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCYKDFIACPDCSDH6GG6PBDSCOEGXANCNFSM4QKHLAYQ.

Stonesjtu avatar Aug 25 '20 10:08 Stonesjtu

Thanks a lot for your explanations. I'll try and feedback if necessary.

hongyi-zhao avatar Aug 25 '20 10:08 hongyi-zhao

What's the output of nvidia-smi

image I had the same issue, this is the output of nvidia-smi @Stonesjtu

radhikasethi2011 avatar Aug 28 '20 07:08 radhikasethi2011

The problem has been solved. The reason is that I don't have a correct installation of cuda/nvidia-driver. Now, it works smoothly. See the following for details:

$  gpustat --debug
X10DAi-01                  Fri Aug 28 15:15:31 2020  450.51.06
[0] GeForce RTX 2070 SUPER | 41'C,   5 % |   291 /  7977 MB | gdm(35M) werner(132M) werner(111M)
$ nvidia-smi 
Fri Aug 28 15:15:43 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  On   | 00000000:02:00.0  On |                  N/A |
| 30%   41C    P8    17W / 215W |    294MiB /  7977MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1933      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      2631      G   /usr/lib/xorg/Xorg                135MiB |
|    0   N/A  N/A      4075      G   /usr/bin/gnome-shell              111MiB |
+-----------------------------------------------------------------------------+

hongyi-zhao avatar Aug 28 '20 07:08 hongyi-zhao

@radhikasethi2011 Your problem seems like windows compatibility issue from PyNVML. I don't have any windows GPU server by my side, so I'm afraid I cannot fix it my self. But can you take a look at this link (https://forum.faceswap.dev/viewtopic.php?t=14)?

@wookayin How do you think adding a windows support section in the documentation.

Stonesjtu avatar Aug 28 '20 07:08 Stonesjtu

It is indeed a good datapoint where nvidia-smi works but PyNVML cannot load the shared library (first time seeing this) in Windows (@radhikasethi2011's case). On Ubuntu it was probably fine (@hongyi-zhao's case). Not sure why, but the link you posted says:

The most likely issue for this is that you have Windows drivers installed through Windows Update/Windows Store.

So we should provide an instruction saying that the drivers should be obtained from the Nvidia website. @radhikasethi2011, Can you confirm this is the case for yours and whether this solves your issue?

I will add some notes in the README, and more informative error messages (which should be shipped from the next release though).

wookayin avatar Aug 28 '20 21:08 wookayin

In another issue #86, @eusoubrasileiro used a workaround of copying nvml.dll from Windows\System32 to site-packages folder. This would be somewhat python-path-related problem and only a quickfix, but hope it helps.

wookayin avatar Aug 28 '20 22:08 wookayin

@Stonesjtu @wookayin updated my nvidia driver but nothing changed. Will uninstall and install again from the nvidia website and update here soon.

radhikasethi2011 avatar Aug 29 '20 02:08 radhikasethi2011

Did you mean you updated your driver through windows installer?

wookayin avatar Aug 29 '20 04:08 wookayin

@wookayin no, through the nvidia website. Will try the workaround

radhikasethi2011 avatar Aug 29 '20 07:08 radhikasethi2011

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

garcolazo avatar Jan 03 '21 09:01 garcolazo

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.

shirishkz avatar Jan 30 '21 07:01 shirishkz

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

This worked perfectly, thank you !

eduardatmadenn avatar Apr 05 '21 10:04 eduardatmadenn

This was my solution hope it helps someone: pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location. If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one. Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.

This works for me with a slight change: The location of nvml.dll is now in C:\Windows\System32\DriverStore\FileRepository\nvrzui.inf_amd64_8df10ddaac270452

nikky4D avatar May 30 '21 00:05 nikky4D

You can solve this issue as belows:

  1. Search "nvml.dll" file in "C:\Windows\System32\DriverStore\FileRepository"
  2. Copy "nvml.dll" file to "C:\Program Files\NVIDIA Corporation\NVSMI" (Make NVSMI folder if not in there by yourself)
  3. Done

jungwon-choi avatar Jul 23 '21 09:07 jungwon-choi

Let me close this issue now, now that we have v1.0 released. I believe the new version of pynvml should have no problem, but if anyone runs into a similar issue on Windows, please create a new issue. Thanks.

wookayin avatar Sep 04 '22 23:09 wookayin