model_analyzer icon indicating copy to clipboard operation
model_analyzer copied to clipboard

Installing model-analyzer How do I specify the dcgm version?

Open XIAO-FAN-5257 opened this issue 1 year ago • 3 comments

use pip3 install triton-model-analyzer, when using

model-analyzer profile --model-repository models/ --profile-models bls

it is failing with the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 668, in _LoadDcgmLibrary
    dcgmLib = CDLL(lib_file)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/libdcgm.so.3: cannot open shared object file: No such file or directory

my libdcgm:

/usr/lib/x86_64-linux-gnu/libdcgm.so
/usr/lib/x86_64-linux-gnu/libdcgm.so.2
/usr/lib/x86_64-linux-gnu/libdcgm.so.2.2.9

environment

CUDA 11.8
Ubuntu 20.04
Triton Server 22.12
python 3.8.10

Has anyone seen it before ?

XIAO-FAN-5257 avatar Sep 23 '24 09:09 XIAO-FAN-5257

Have you tried the steps outlined here? https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html

nv-braf avatar Sep 23 '24 14:09 nv-braf

Thanks! I solved the problem. Copy from other machine libdcgm. So. 3 files, and then modify the model_analyzer/monitor/DCGM/dcgm_structs py files to specify .So file location.

XIAO-FAN-5257 avatar Sep 25 '24 08:09 XIAO-FAN-5257

Do not directly upgrade the datacenter-gpu-manager version; otherwise, tritonserver will not find the dcgm

tritonserver:  error while loading shared libraries: libdcgm.so.2: cannot open shared object file:  No such file or directory

XIAO-FAN-5257 avatar Sep 25 '24 08:09 XIAO-FAN-5257