model_analyzer
model_analyzer copied to clipboard
Installing model-analyzer How do I specify the dcgm version?
use pip3 install triton-model-analyzer, when using
model-analyzer profile --model-repository models/ --profile-models bls
it is failing with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 668, in _LoadDcgmLibrary
dcgmLib = CDLL(lib_file)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libdcgm.so.3: cannot open shared object file: No such file or directory
my libdcgm:
/usr/lib/x86_64-linux-gnu/libdcgm.so
/usr/lib/x86_64-linux-gnu/libdcgm.so.2
/usr/lib/x86_64-linux-gnu/libdcgm.so.2.2.9
environment
CUDA 11.8
Ubuntu 20.04
Triton Server 22.12
python 3.8.10
Has anyone seen it before ?
Have you tried the steps outlined here? https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html
Thanks! I solved the problem. Copy from other machine libdcgm. So. 3 files, and then modify the model_analyzer/monitor/DCGM/dcgm_structs py files to specify .So file location.
Do not directly upgrade the datacenter-gpu-manager version; otherwise, tritonserver will not find the dcgm
tritonserver: error while loading shared libraries: libdcgm.so.2: cannot open shared object file: No such file or directory