Nik Konyuchenko
Nik Konyuchenko
@blackjack2015, The DCP family of metrics (1001-1015) are not supported on RTX GPUs. The profiling module is not loaded if supported GPUs are not detected. WBR, Nik
The DCP metrics are only supported on Datacenter grade and Quadro GPUs. Neither RTX nor GTX kind of GPUs is supported. There are no plans to support those GPUs as...
@lilohuang, Could you share the `nvidia-smi -q` output?
Could you provide `nvidia-smi` and `nvidia-smi -q` output?
@ligeweiwu, To use `dcgmproftester,` there are two options: - Obtain the `libdcgmmoduleprofiling.so` file from the official DCGM package. - Use the `--no-dcgm-validation` flag to generate load without reading metric values...
@SamKG, The DCGM_FI_PROF_* metrics (also known as DCP) are managed by the `libdcgmmoduleprofiling.so` library, which is not open-source. The `dcgmproftester*` tool is specifically designed to test these fields. Open-sourced part...
@SamKG, You can run nv-hostengine with the log-level debug option to see a more detailed error message. However, based on your nvidia-smi output, it appears that you have a GeForce-class...
The 226 error means DCGM_ST_NVVS_ERROR (-30) - dcgmi returns negative error codes that are represented as unsigned ints. This error means there is some unexpected error in nvvs (backend for...
@xuchenhui-5, Could you provide the dmesg output? It should work if DCGM does not report that a third-party module fails to load on A10. The hanging and the fact that...
@jdmaloney, Could you try to load the nvidia driver with option `NVreg_RmPowerFeature=0x40` and see if that reproduces?