dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

Could not enable kubernetes metric collection: nvml: Unknown Error

Open 287400117 opened this issue 1 year ago • 2 comments

What is the version?

3.1.8-3.1.5

What happened?

When DCGM_REMOTE_HOSTENGINE_INFO is configured in dcgm-exporter, occasional errors may occur after the dcgm-exporter Pod is rebuilt, but the issue can be resolved by restarting the container using docker restart. The error message is as follows: image

What did you expect to happen?

rt

What is the GPU model?

No response

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response

287400117 avatar May 23 '24 08:05 287400117

@287400117 , Try to use the latest version of the dcgm-exporter.

nvvfedorov avatar May 24 '24 16:05 nvvfedorov

@287400117 , Try to use the latest version of the dcgm-exporter.

#330 There will be another error in the latest version.

287400117 avatar May 30 '24 02:05 287400117