Function not found for nvml methods
I have a 1080Ti GPU with CUDA 10.2, NVIDIA driver 440.59 and pynvml version 11.4.1 running on Ubuntu 16.04.
1. I am trying to profile my PyTorch code using scalene. When I run my code as scalene main.py I get the following error:
Error in program being profiled:
Function Not Found
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 1612, in profile_code
exec(code, the_globals, the_locals)
File "./code-exp/main.py", line 1, in <module>
import numpy as np
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/__init__.py", line 140, in <module>
from . import core
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/__init__.py", line 22, in <module>
from . import multiarray
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/multiarray.py", line 12, in <module>
from . import overrides
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/core/overrides.py", line 9, in <module>
from numpy.compat._inspect import getargspec
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/numpy/compat/__init__.py", line 14, in <module>
from .py3k import *
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_profiler.py", line 719, in cpu_signal_handler
(gpu_load, gpu_mem_used) = Scalene.__gpu.get_stats()
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_gpu.py", line 110, in get_stats
mem_used = self.gpu_memory_usage(self.__pid)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/scalene/scalene_gpu.py", line 101, in gpu_memory_usage
for proc in pynvml.nvmlDeviceGetComputeRunningProcesses(handle):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2223, in nvmlDeviceGetComputeRunningProcesses
return nvmlDeviceGetComputeRunningProcesses_v2(handle);
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
To validate whether or not this issue is coming from scalene library, I run the following commands:
>>> from pynvml import *
>>> nvmlInit()
>>> nvmlSystemGetDriverVersion()
b'440.59'
>>> handle = nvmlDeviceGetHandleByIndex(0)
>>> nvmlDeviceGetComputeRunningProcesses(handle)
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2223, in nvmlDeviceGetComputeRunningProcesses
return nvmlDeviceGetComputeRunningProcesses_v2(handle);
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> nvmlDeviceGetGraphicsRunningProcesses(handle)
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2260, in nvmlDeviceGetGraphicsRunningProcesses
return nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2228, in nvmlDeviceGetGraphicsRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> list(map(str, nvmlDeviceGetGraphicsRunningProcesses(handle)))
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetGraphicsRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2260, in nvmlDeviceGetGraphicsRunningProcesses
return nvmlDeviceGetGraphicsRunningProcesses_v2(handle)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2228, in nvmlDeviceGetGraphicsRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetGraphicsRunningProcesses_v2")
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
>>> nvmlDeviceGetComputeRunningProcesses_v2(handle)
Traceback (most recent call last):
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
func = self.__getitem__(name)
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/home/kube-admin/miniconda3/envs/temporl/lib/python3.8/site-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
This error seems to be coming from pynvml. I am not sure why this is the case.
2. Steps to reproduce the issue: Install the driver mentioned at the beginning of the post on Ubuntu 16.04. I suppose scalene is not needed.
Common error checking:
- [x] The output of
nvidia-smi -aon your host
==============NVSMI LOG==============
Timestamp : Fri Sep 2 21:00:09 2022
Driver Version : 440.59
CUDA Version : 10.2
Attached GPUs : 3
GPU 00000000:02:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-cdeefa17-d5c4-bf51-15de-048f84c9c228
Minor Number : 0
VBIOS Version : 86.02.39.00.22
MultiGPU Board : No
Board ID : 0x200
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:02:00.0
Sub System Id : 0x85E51043
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 18 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11177 MiB
Used : 0 MiB
Free : 11177 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 43 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 60.06 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 1480 MHz
SM : 1480 MHz
Memory : 5508 MHz
Video : 1252 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:03:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-e4f1baff-dc90-4107-e5e0-6b7bfb9e6d68
Minor Number : 1
VBIOS Version : 86.02.39.00.22
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x85E51043
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 18 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11178 MiB
Used : 0 MiB
Free : 11178 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 43 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 61.03 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 1480 MHz
SM : 1480 MHz
Memory : 5508 MHz
Video : 1252 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
GPU 00000000:83:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-b701a6bb-1912-6f4e-6bbb-8d1f87805edc
Minor Number : 3
VBIOS Version : 86.02.39.00.22
MultiGPU Board : No
Board ID : 0x8300
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x83
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:83:00.0
Sub System Id : 0x85E51043
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 17 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11178 MiB
Used : 0 MiB
Free : 11178 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 40 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 55.13 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 151 MHz
SM : 151 MHz
Memory : 5508 MHz
Video : 734 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
- [ ] Your docker configuration file (e.g:
/etc/docker/daemon.json) - [ ] The k8s-device-plugin container logs
- [ ] The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet)
Additional information that might help better understand your environment and reproduce the bug:
- [x] Docker version from
docker version:Docker version 1.11.2, build b9f10c9 - [ ] Docker command, image and tag used
- [x] Kernel version from
uname -a:4.4.0-130-generic #156-Ubuntu - [ ] Any relevant kernel output lines from
dmesg - [ ] NVIDIA packages version from
dpkg -l '*nvidia*'orrpm -qa '*nvidia*' - [ ] NVIDIA container library version from
nvidia-container-cli -V - [ ] NVIDIA container library logs (see troubleshooting)
This is most likely due to how pynvml (or this one https://pypi.org/project/nvidia-ml-py/#history) is wrapping the underlying NVML library. It seems to require certain _v2 symbols that are not present in the library associated with your driver version.
I will try to find out where one can ask questions about the bindings.
It's here: https://forums.developer.nvidia.com/c/development-tools/other-tools/system-management-and-monitoring-nvml/128
phw89 :"This has now been fixed, as of the 526 driver release." I am try
you can downgrade pynvml (such as from 11.5.0 to 11.4.0)