Nik Konyuchenko

Results 96 comments of Nik Konyuchenko

@starry91, Could you clarify which driver version is installed?

@starry91, Could you replace the ptx file next to the dcgmproftester11 with the attached one and see if that works? [DcgmProfTesterKernels.ptx.zip](https://github.com/NVIDIA/DCGM/files/11277596/DcgmProfTesterKernels.ptx.zip)

@starry91, This is an issue on the DCGM side. The ptx file was built with a newer version of the CUDA SDK than it should have been. This will be...

@starry91, That affects the dcgmproftester only.

@optyang, The value that you observe is 0x7ffffff0, which is DCGM_INT32_BLANK. In our tools, that usually leads to 'N/A' output. Please, take a look at this code: https://github.com/NVIDIA/DCGM/blob/cc3fe64d966d956cebba3e3ff1334786dd767d35/testing/python3/dcgmvalue.py#L45

@jxh314, There are a few limitations with the profiling module for GPUs. Firstly, it is not open-sourced. If you want to use it, you must obtain the module from the...

@Xaraxia, Could you provide the `nvidia-smi` and `nvidia-smi -q` output? Please kindly provide more information about your setup. Specifically, let us know if you are running nv-hostengine on a bare-metal...

@bergentruckung, Could you provide `nvidia-smi` and `dcgmi discovery -c` output?

@nguoido, The file `diag-skus.yaml`, which is not found, is a component of the regular datacenter-gpu-manager package and should be installed together. The datacenter-gpu-manager-config package is only created or released if...

@guleng, It's not clear what do you want to change here. The resource will be associated with a pod until the pod is terminated. See some K8s sources for reference:...