Nik Konyuchenko

Results 96 comments of Nik Konyuchenko

> > > @xuchenhui-5, > > > Could you provide the dmesg output? It should work if DCGM does not report that a third-party module fails to load on A10....

We were able to replicate the issue and have confirmed that there is a problem with the A10 GPUs. The issue can be partially resolved if the device has an...

Hello @hassanbabaie, Unfortunately, it is currently not possible to break down pipelines in order to isolate FP8 utilization.

Hi @jaslip, To better understand the issue, we will need the log lines right before the lines you provided. The error -37 is already converted from the underlying subsystems and...

DCGM uses the [NSCQ](https://github.com/NVIDIA/apt-packaging-libnvidia-nscq) library, which must be installed separately from DCGM and is bound to the driver version.

@BetaZYN, The amount of memory allocated depends on the available VRAM on GPUs. Several memory tests require allocating large buffers in memory.

_1. How these methods are calculated?_ In general - The duration of the SMs being busy vs. non-busy during the polling interval. That's for utilization. For occupancy - how many...

Clarification on the first question: we are using NVML API to get values for the fields you mentioned in another issue (DCGM_FI_DEV_GPU_UTIL and DCGM_FI_DEV_MEM_COPY_UTIL) and here is the NVML documentation...

@f2hkop, The dcgmi does not provide such functionality right now. This is a useful feature and we will add it in the future releases. WBR, Nik

@krishh85, There is no method to convert the utilization metrics and compare them to the theoretical FLOP numbers. **FP64_ACTIVE** The percentage of cycles in which the SM execution pipes are...